The contribution of repetitive elements to salmonid genome evolution
Abstract
Eukaryotic genomes typically consist of a substantial proportion of repetitive DNA in the form of transposable elements (TEs) and satellite DNA. From studies of mammals, model species, and a few other well studied lineages it is clear that repetitive elements play roles in many important cellular processes and shape evolution of genomes and organisms. However, little is still known about the role of repetitive DNA in biology and genome evolution for most eukaryotic species. Here we use a suite of omics data and genomics analyses to ask the question: What is the role of repeat DNA in genome regulation and structural variation in the Atlantic salmon genome?
In papers 1 and 2 we studied the link between evolution of gene regulation and transposable elements in the context of the salmonid whole genome duplication. We found that gene duplicate copies that had evolved lower gene expression across most tissues had increased TE insertion rates in the promoters. In addition, we found that duplicate copies evolving liver specific increase in gene expression, had gained transcription factor binding sites (TFBS) for liver-specific transcription factors in the promoters, and some of these were found inside TEs. In depth analyses of cis-regulatory elements (CREs) in Paper 2 showed that 15-20% of CRE are within TEs (TE-CREs) and that there were fewer TE-CREs active in brain tissue compared to liver. Interestingly, a small heterogeneous group of TE subfamilies (11%) had contributed ~45% of all TE-CREs, but the ‘superspreader’ activity did not seem to peak in the time shortly following the WGD. CREs donated by ‘superspreaders’ were enriched for many different TFBSs, however, highly brain specific TFBSs were extremely rare in TEs, indicating that strong purifying selection shape TE-CRE evolution.
In Paper 3 we studied the role of repeat-DNA in the evolution of structural genomic variation (SVs) (>50bp). Leveraging seven new long read genome assemblies we find a large number of so far unknown structural variants, and conclude that satellite DNA is highly associated with indel variants. TEs, on the other hand, had contributed comparatively much less to the SV landscape. We conclude that the enormous number of novel SV found in our study is mostly due to satellite expansion and -contraction processes.
This thesis provides an advance in our understanding of the role of repetitive DNA in the evolution of salmonid genomes, and paves the way for future studies into the functional importance of this vast sea of repetitiveness. Ekaryote genom består vanlegvis av ein vesentleg andel gjentakande DNA i form av transposable element (TE-ar) og satelitt-DNA. Frå studiar i pattedyr, modelartar og nokre andre velstuderte organismar er det klart at gjentakande element speler rollar i mange viktige cellulære prosessar og formar evolusjon av genom og organismar. Mykje er imidlertid ukjend om rolla til gjentakande DNA i biologi og genomevolusjon i dei fleste eukaryote artar. Her nyttar vi ein rekke omikk-data og genomiske analysar for å stille spørsmålet: Kva er rolla til gjentakande DNA i genomregulering og strukturell variasjon i genomet til atlantisk laks? I artikkel 1 og 2 studerte vi koplinga mellom evolusjon av genregulering og transposable element i lys av den salmonide heilgenomdupliseringa. Vi fann at dupliserte gener som hadde redusert uttrykk over dei fleste vev hadde større andeler TE i promoteren. I tillegg fann vi at dupliserte gener som evolverte leverspesfikk auke i genuttrykk hadde fått transkripsjonsfaktorbindingsseter (TFBS) for lever-spesifikke transkripsjonsfaktorar, og at somme av desse var inne i TE-ar. Djupare analysar av cis-regulatoriske element (CRE-ar) i Artikkel 2 synte at 15-20% av CRE-ar er inne i TE-ar (TE-CRE-ar) og at det var færre TE-CRE-ar aktive i hjernevev enn i lever. Interessant nok bidro ei lita, heterogren gruppe (11%) med TE-ar med ca 45% av alle TE-CRE-ar, men denne “superspreiaraktiviteten” så ikkje ut til å nå høgda umiddelbart etter heilgenomdupliseringa. CRE-ar gitt av “superspreiarar” var anrika for mange forskjellige TFBS-ar, men svært hjernespesifikke TFBS-ar var svært sjeldne i TE-ar, som antyder at sterk seleksjon har forma TE-CRE-evolusjon.