Quick News Spot

Developmentally regulated genes drive phylogenomic splits in ovule evolution - Nature Communications


Developmentally regulated genes drive phylogenomic splits in ovule evolution - Nature Communications

Phylogenomic reconstruction identifies 22,429 ortholog groups informing seed plant evolution

To identify genes that influenced seed evolution across plant groups, we collected leaf and reproductive (ovule or fern sporangia) tissues for deep RNA-seq analysis from 20 plant species, selected to span all major seed plant groups and the ferns (Fig. 1B, C, Supplementary Data 1). The seed plants include 14 gymnosperms (two Araucariales, five Cupressales, two Pinales, three Cycadales, Gnetum gnemon, and Ginkgo biloba) and four angiosperms (one monocot, Iris pseudoacorus; one eudicot, Arabidopsis thaliana, and two magnoliids, Liriodendron tulipifera and Piper nigrum) (Supplementary Data 2). For simplicity, genus names will signify the member species used in this study, unless otherwise indicated.

The twenty deep transcriptomes obtained for these species are highly complete, with combined BUSCO complete and fragmented scores ranging from 86.5% to 95.5% (Fig. 1D). For gymnosperms with published genomes, our transcriptome assemblies were close to or exceeded the BUSCO scores of their genome assembly counterparts (Supplementary Fig. 1). Gene sets obtained for each species ranged from 20,000 to 40,000 genes (Fig. 1E). We also assembled an expanded transcriptome of Taxus baccata, which is our gymnosperm experimental test system, using RNA extracted from leaf, ovule, green aril, and pollen tissues.

Most gymnosperm species in our dataset do not have published genome assemblies or annotations. Therefore, we used our revised PhyloGeneious pipeline (Supplementary Fig. 2) (see "Methods") to identify the shared genes (orthologs) across these 20 species. Our analysis identified 43,248 ortholog groups in two or more species, with 1809 ortholog groups conserved across all 20 species, and 5934 groups shared between ferns and seed plants. Of the 37,314 seed plant-specific ortholog groups, 509 were conserved in all seed plants, 655 groups were specific to and conserved in all gymnosperms, and 2584 groups were specific to and conserved in all angiosperms (Supplementary Fig. 3). We also identified 1865 ortholog groups that were specific to the angiosperms and cycads, 470 of which were conserved in all angiosperms and 579 of which were only shared between cycads and Piper (Supplementary Fig. 4). Cycads had the most orthologs shared between angiosperms and any single gymnosperm order, followed by Gnetum which had 347 shared orthologs of which 169 were conserved in all angiosperms.

We then used ortholog groups present in four or more species to construct parsimony (Fig. 2A) and maximum likelihood (Supplementary Fig. 5) species trees for determining the genes driving phenotypic changes across plant groups. Ortholog groups consisted of one representative per species chosen from the species' paralogs. This resulted in 23,525 ortholog groups used for phylogeny estimation, 22,429 of which were parsimony informative (i.e., containing at least one position in the ortholog protein alignment where a minimum of two different amino acid types were conserved in two or more species). The species relationships in our trees agree with previous literature findings. The only conflict between our parsimony and likelihood trees is the placement of Gnetum as either basal to all gymnosperms or sister to the Pinales order, which also occurs in other studies.

Conserved or changing expression patterns can inform how an ortholog group functions across species. As an initial step to classify candidate ovule development genes, we used differential expression to identify genes that were more highly expressed in seed plant ovule (Ov) or fern sporangia (S) tissues (from here referred to as OvS) relative to leaves (L) in each species. This yielded 53,248 OvS DE genes across all species, representing 19,250 ortholog groups (Fig. 2B, C). 7910 of these ortholog groups had OvS differential expression in two or more species. L differential expression included 58,560 genes across all species, representing 18,084 ortholog groups (Figs. 1E and 2B).

To determine whether ovule development genes were a major factor driving seed plant divergence, we compared our ortholog groups containing DE genes with the ortholog groups that were informative for our parsimony phylogeny (Fig. 3). We found that ortholog groups DE in any tissue (i.e., at least one DE member gene) were significantly more likely to be informative than random chance, with this being even more significant for groups DE in two or more species (Supplementary Fig. 7, Fig. 3B). Conversely, ortholog groups that contained no DE genes were significantly less likely to be informative (Fig. 3B, Supplementary Data 3). We identified a third category of ortholog groups, "Mixed DE", where DE is found across multiple species but with regulatory divergence such that some species have L DE and other OvS DE. Ortholog groups DE in either tissue type (OvS, L, or Mixed) were also significantly more likely to have a higher number of informative positions and significantly more likely to have informative positions fall within predicted protein domains, suggesting functional relevance (Supplementary Fig. 8, Supplementary Data 3). Functional analysis of informative groups containing Arabidopsis orthologs found significant enrichment (p-adj < 0.05) for GO terms related to chloroplast and transcriptional regulation, while slightly less enriched terms (p-val < 0.01) included plant organ morphogenesis, flower and tissue development, and vesicle-mediated transport (Supplementary Data 4).

We next examined how individual ortholog groups influenced species divergence at our three most basal evolutionary splits: (1) the split between seed and spore plants, (2) the split between gymnosperms and angiosperms, and (3) the split between cycad&Ginkgo and conifer gymnosperms (Fig. 3A). We achieved this by performing partition branch support (PBS) analysis with our parsimony tree, which measures the amount of support in amino acid changes each data partition (ortholog group) provides to a given evolutionary clade in the tree calculation. We found that several of the ortholog groups with the strongest PBS values for each split corresponded to flowering and reproductive development genes (Supplementary Data 5). A broader look also revealed that the majority of the ortholog groups highly supporting these evolutionary splits had Mixed DE supporting evolutionary splits at a much higher proportion than random chance (p-val <<< 0.01) (Fig. 3C, Supplementary Data 3). The particularly high enrichment of mixed DE genes as informative to clade divergences suggests that both sequence and regulatory divergence underlie evolutionary splits.

To uncover the genes that have driven seed evolution across species, we integrated our phylogenomic and evolutionary findings with gene expression and compared our evolutionary informative ortholog groups against prior knowledge of genes involved in ovule development. Below, we describe three different strategies for integrating these two knowledge sources: (i) mapping differential expression patterns to orthology and phylogeny, (ii) utilizing known ovule genes to identify related orthologs, and (iii) characterizing ortholog groups with strong evolutionary influence and evidence of ovule-related function. We highlight some of our key discoveries obtained from each strategy.

To test whether combining phylogenomics and expression could identify genes of importance to seed evolution, we started looking for ortholog groups implicated in ovule-related function in a majority of species. Most ortholog groups with conserved OvS differential expression were only conserved in a few species, with only 10 groups containing OvS DE genes from ten or more species (Fig. 4A). Four of the most frequently ovule DE ortholog groups included Arabidopsis genes: ACOS5/4CLL1, CYP704B1, DRL1/TKPR1, and CYP703A2 (Fig. 4A). These four genes all have documented roles in angiosperms in the formation of exine, the outer layer of the pollen wall, and prior experimental evidence suggests these proteins interact. All four exine ortholog groups support the seed|spore evolutionary split (Fig. 4B) and slightly influence the split of cycad&Ginkgo in the conifers (Fig. 4B, C). The ACOS5 ortholog also strongly supports the separation of Gnetum from the other gymnosperms, with a PBS value of 11 (Supplementary Data 6). These pollen gene orthologs are highly DE in all fern sporangia and most gymnosperm ovules yet show little differential expression in angiosperms (with the exceptions of the Iris CYP704B1 ortholog in ovules and the Zamia CYP703A2 ortholog in leaves) (Fig. 4C). The Taxus orthologs of all four genes also show similar or higher expression levels in pollen relative to ovules in our expanded transcriptome. This gymnosperm-specific expression pattern matches prior morphological observations that the gymnosperm megaspore wall is similar to pollen exine, while angiosperm megaspore walls are absent or highly reduced (Fig. 4D).

The second ortholog group observed as ovule DE in thirteen species included Ginkgo GbMADS8, one of two gymnosperm-specific duplications of Arabidopsis AGL6 that is expressed throughout the developing ovule (Supplementary Fig. 9). GbMADS8 strongly supports the cycad&Ginkgo|conifer split. The PRX17 and RAD50 ortholog groups that were ovule DE in nine species are also known to have roles in reproductive development in Arabidopsis. Overall, these results demonstrate that integrating orthology with cross-species expression data yields key insights into conserved developmental pathways.

To assess our method's ability to recover established seed development genes, we searched our ortholog groups for 39 known ovule development genes and 480 validated embryo lethal ("seed") genes from Arabidopsis (Fig. 5A) (Supplementary Data 8). Our pipeline found orthologs in other species for 26 of the known ovule genes and 405 of the known seed genes (Supplementary Data 9).

Our analysis identified some unexpected orthology within the type II MADS-box gene family, whose members are frequently reported to play roles in reproductive development. SEPALLATA (SEP) is well-known as one of the major ovule-identity determinators in angiosperms, forming a complex with AGAMOUS, SHATTERPROOF 1&2, and SEEDSTICK. SEPALLATA has also been documented as exclusively found in angiosperms. While paralogs SEP1, 2, and 4 appear to be specific to Arabidopsis in our analysis, we found SEP3 orthologs in all four angiosperm species as well as in two gymnosperm cycads, Zamia and Stangeria (Supplementary Fig. 10). These cycad homologs are highly similar to the angiosperm genes in the sequence alignment, although the Stangeria homolog lacks a portion of the conserved MADS-box domain (Supplementary Fig. 11), and running BLAST searches of both cycad sequences in the NCBI nr database only yielded matches to SEP3 sequences from other plant species. The SEP ortholog group was informative to our species tree (though not a high supporter of the major evolutionary splits), and 10 of the 27 informative sites occurred within the conserved K-box domain. All angiosperm SEP3 orthologs were DE and highly expressed in ovules, while the cycad SEP3 orthologs were expressed at very low levels, with the Zamia ortholog observed exclusively in ovules. We did not observe a SEP3 ortholog in our Cycas transcriptome, and searching the Cycas panzhihuaensis genome also found no match. Among the other four genes that complex with SEP, most gymnosperm orthologs were distantly related, although Zamia had an additional ortholog that clustered directly with SEEDSTICK (Supplementary Fig. 12). There were also several clades of gymnosperm-specific MADS-box genes.

Lastly, we further interrogated the power of our gymnosperm data to enhance the annotation of ovule-related genes in the model Arabidopsis. We identified 5437 Arabidopsis genes that had orthologs in multiple gymnosperms (Supplementary Data 10), then combined ovule differential expression with existing Arabidopsis reproductive annotations (Fig. 5A). Both Arabidopsis and gymnosperms were able to uniquely annotate genes from other species. Our gymnosperm expression data annotated the most Arabidopsis genes as ovule-related, the vast majority of which (1224) were genes not identified by existing Arabidopsis reproductive annotations (Fig. 5B, Supplementary Data 11) from a curated set of GO-term annotations. For these orthologs with known expression patterns in Arabidopsis, 77.7% were expressed in flowers, and more than 60% were expressed in other reproductive tissues (Supplementary Data 12). In addition, 96.9% were expressed during petal differentiation and expansion (enrichment p-val < 0.05), 91.0% were expressed during the embryo cotyledonary stage (enrichment p-adj < 0.05), and 88.8% were expressed in mature plant embryos (enrichment p-adj < 0.05) (Supplementary Data 13). GO term analysis found two of the 1224 ortholog groups were annotated for embryo and post-embryonic development (enrichment p-adj < 0.05), and one group was annotated for meiotic cell cycle (enrichment p-val < 0.01). There was also significant GO enrichment (p-adj < 0.05) for chloroplast and mild enrichment (p-val < 0.01) for GO terms related to plastids, organelle envelope, phosphorylation, ubiquitination, positive regulation, mRNA binding, DNA metabolism, anatomical morphogenesis, organophosphate and thiamine metabolism, hydrolase activity, and response to water deprivation (Supplementary Data 14). PFAM enrichment analysis showed mild enrichment for PPR and leucine-rich repeats among individual domains and mild enrichment for WD proteins and S33 serine aminopeptidases when examining domain combinations. There was also a general prevalence of protein kinases, DEAD/DEAH box helicases, F-box proteins, TPR repeats, and RNA recognition motifs (Supplementary Data 15 and 16).

Our next goal was to leverage Arabidopsis prior knowledge in conjunction with our phylogenomic and expression data for the discovery of candidate ovule genes. We identified 15 ovule and 102 seed gene ortholog groups conserved in 15 or more species and used their differential expression pattern (log2 fold change) across species to search for additional ortholog groups with correlated expression (Fig. 5A) (Supplementary Fig. 18). This comparison approach also controls for potential differences in the ovule developmental stages collected across species. Our analysis identified 2396 highly correlated ortholog groups, with 42 groups correlated to known ovule and known seed genes, 29 groups correlated only to known ovule genes, and 2325 groups correlated only to known seed genes. In many cases, these correlated groups included genes with known ovule or reproductive functions; for example, the known ovule gene ADA2B/PRZ1, a histone acetyltransferase regulator required for proper integument development in angiosperms. This analysis also identified known reproductive development genes, including the mRNA processing gene RDM16, the meiotic DNA repair gene RAD50, and multiple embryo-lethal mutants as correlated ortholog groups (Supplementary Fig. 19, Supplementary Data 17).

We now sought to fully integrate our evolutionary findings with functional evidence to identify genes with the largest impact on ovule evolution. To accomplish this, we specifically focused on ortholog groups that had high PBS scores for one of the major splits, as well as support for ovule-related function from expression and/or prior knowledge evidence (Fig. 6A). These criteria identified 4076 candidate ortholog groups associated with ovule evolution, of which 2267 support one basal evolutionary split, 1211 support two evolutionary splits, and 598 give high support for all three splits (Fig. 6B). These candidate ortholog groups included three of the known Arabidopsis ovule genes (HLL, ADA2B, and SEUSS) and 188 of the known seed genes, as well as 397 other Arabidopsis genes annotated with reproductive roles (see "Methods"). Most of the candidate ortholog groups supported either (i) only the cycad&Ginkgo|conifer split, (ii) the seed|spore and gymnosperm|angiosperm splits, or (iii) only the gymnosperm|angiosperm split(Fig. 6B). Interestingly, very few orthologs only supported the seed|spore split, and none of those groups relate to known ovule genes (Fig. 6B). To further confirm the quality of our PBS scores for identifying valuable evolutionary candidates, we looked for ortholog groups with significant positive selection in seed plants relative to ferns and identified 557 orthologs, 315 of which were included in our candidate ortholog groups (Supplementary Data 18 and 19).

To learn the types of candidate genes that drove ovule evolution, we performed functional enrichment of PFAM (protein family) domains for the candidates supporting each combination of the three evolutionary splits. Enrichment was performed both for single domains (Supplementary Data 20) and for co-occurring domain groups (Supplementary Data 21). Candidate ortholog groups supporting the seed|spore and gymnosperm|angiosperm splits were highly enriched for WD proteins, DEAD/DEAH box helicases, cytochrome P450s, and Rubisco, indicating roles in the early divergence of the seed plants (Supplementary Data 22). DEAD/DEAH box genes were also enriched in candidates only supporting the gymnosperm|angiosperm and cycad&Ginkgo|conifer splits, while cytochromes were additionally enriched in candidates only supporting the cycad&Ginkgo|conifer split. PPR proteins were also frequently enriched among candidates supporting all three splits, solely the gymnosperm|angiosperm and cycad&Ginkgo|conifer splits, or only the cycad&Ginkgo|conifer split, possibly reflecting their expansion within the seed plants. Single domain enrichment further identified TPR and SET domains among seed|spore and gymnosperm|angiosperm split supporters, F-box domains among groups supporting all splits or only the cycad&Ginkgo|conifer split, and ankyrin repeats among groups supporting all splits or solely the gymnosperm|angiosperm and cycad&Ginkgo|conifer splits. These candidate orthologs (Fig. 6, Supplementary Data 23) are a treasure trove for discovering uncharacterized ovule genes and dissecting the evolutionary history of previously identified genes.

Having identified several genes as candidate drivers of ovule evolution, we next sought to confirm whether these genes' support for evolutionary splits corresponded to changing roles in ovule development in non-model species. To test this, we chose a gymnosperm-specific BELL ortholog group containing the Gnetum gene MELBEL1, which strongly supported our cycad&Ginkgo|conifer split (Fig. 6B: Split 3) (Supplementary Fig. 20). Gnetum MELBEL1 is expressed in ovules throughout the developing nucellus, while the Ginkgo homolog GibiBEL1-2 has expression more restricted to the megaspore mother cell and ovule base. However, MELBEL1 has not yet been characterized in conifers. Our MELBEL1 ortholog group was ovule DE in five of fourteen gymnosperms, all conifers (Supplementary Fig. 20), and leaf DE in Gnetum. We also examined a neighboring uncharacterized gymnosperm-specific BELL group, here dubbed GYMNOSPERM BELL 2 (GBEL2), which was ovule DE in seven of thirteen gymnosperms and not observed in Gnetum.

In order to validate the role of these gymnosperm-specific BELLs in conifer ovules, we performed in situ hybridization of MELBEL1 and GBEL2 orthologs in the conifer Taxus baccata in three stages of developing ovules and young vegetative tissue (Fig. 7). Ovules of Taxus species are notable for the fleshy fruit-like aril that surrounds the seed (Fig. 7B); fused arils are also observed in Cephalotaxus species while this structure is absent in all other gymnosperm groups (Fig. 7A).

Our in situ experiments showed that TbMELBEL1 was expressed throughout the shoot apical meristem and adjacent leaf primordia in young vegetative tissue (Fig. 7C). In Stage 1 ovules, TbMELBEL1 was strongly expressed throughout the nucellus (Fig. 7D). In Stage 3 ovules, TbMELBEL1 was expressed throughout the nucellus and in the interior of the integuments, as well as throughout the developing aril (Fig. 7E). TbMELBEL1 was also very highly expressed in the megaspore mother cell (Fig. 7F). In Stage 5 ovules, TbMELBEL1 was expressed throughout the developing megaspore and the cone bracts, but no longer observed in the arils (Fig. 7G). To confirm the tissue-specific expression of TbMELBEL1, we observed this gene's quantified expression in the expanded transcriptome of leaf, ovule, green aril, and pollen tissues. The expanded Taxus transcriptome confirmed that TbMELBEL1 is highly expressed in ovules and green arils, with lower expression in pollen and leaves.

Previous articleNext article

POPULAR CATEGORY

misc

6679

entertainment

7115

corporate

5944

research

3557

wellness

5886

athletics

7475