Shawn Burgess spoke about the importance of applying genome assembly methods to fish genomics.
Shawn Burgess, chief of the Office of Scientific Core Facilities in the Developmental Genomics Section of the National Human Genome Research Institute at the National Institutes of Health (NIH), gave a talk titled "Darwinian Genomics: Rapid Advances in Genome Assembly Can Make Any Fish a Model Organism" as part of the Biology Department's seminar series on Thursday, Nov. 14. Burgess is also co-deputy director of the Division of Intramural Research, senior investigator of the Translational and Functional Genomics Branch and head of the Developmental Genomics Section at the NIH.
Fish exhibit fascinating morphology and biological diversity, which present abundant opportunities for researchers to study these model organisms to understand the core of evolutionary biology. However, past research has been relatively limited due to scientists' ability to generate comprehensive genomic information on these animals. Burgess' research aims to understand how genetic information interacts with restorative processes and to develop efficient knock-out and phenotyping techniques for modeling genomes in zebrafish.
Burgess' decade-long commitment to his research in modeling genomes stems from the gaps in the genomics field, where genomics modeling and sequencing techniques are still being refined.
"The big problem became the repeat sequences. When your sequences are shorter than the length of an array, you get lost," he explained.
However, according to Burgess, there have been significant advances in sequencing genomes, especially human genomes in the past decade, resulting in the first telomere-to-telomere or gapless sequence of a human genome just two years ago. This breakthrough results in exciting prospects for the study of genetic nuances and thus genetic diseases in humans.
With this, he transitioned to the concept of Darwinian genomics, a field utilizing modern genomic techniques to capture the diversity of biology. The goal of Darwinian geneticists is to capture all the variations of genome sequences in biological organisms, which has wide-ranging implications for many fields. Burgess explained all the evolutionary variations we can capture through expanding genomics beyond human sequencing and the potential benefits, including optimizing traditional laboratory animals, changing the genetics of animals for both functional and aesthetic purposes and studying non-model organisms that we can learn from.
"We should not re-sequence the human genome millions or a hundred million times, but we can actually use the amazing diversity of biology that our planet has as one of the largest experimental models available to us," he stated. "All these genomes out there and all these adaptations represent ways to understand biology that we can now capture by sequencing those genomes."
Next, Burgess introduced the concept of gene duplication -- what he and his colleagues view as the origin of evolutionary diversity. For animals to evolve and change in both morphology and functionality, genes must duplicate so that one copy's code can change. This can be accomplished through segmental duplication or whole genome duplication. After duplication, the new gene may become inactivated, take on some functions of the ancestral gene (subfunctionalize) or completely take on a new function (neofunctionalize).
This explains Burgess' interest in goldfish as a model organism to serve as a basis for studying zebrafish. Goldfish have phylogenetic proximity to zebrafish and have recent whole gene duplication. Goldfish mutants resulting from this duplication are also more likely to survive, making them ideal candidates for the study of evolutionary developmental biology and morphological development over eight million years. The many varieties of goldfish created through human intervention and selective breeding speak to this truth.
Burgess explained the study's methodology using a technique called heat shock disruption of mitosis I, allowing researchers from his team to isolate one copy of the genome. Shifting through the genomic data, Burgess explained that the peaks in the genomic data can represent the depth of the sequence coverage, which unfortunately told his team that certain chromosomes didn't match.
Using fragments from the study's data and the genetic map from another team of scientists, Burgess was able to lay out the scaffold for the goldfish genome, the longest of which had 37 million base pairs. With the assembly and RNA-sequencing across goldfish tissues, Burgess and his team found that genomic structures are very stable from goldfish to zebrafish, allowing researchers to make predictions about how genes are being regulated.
"This was a good assembly for the time. Now we would be embarrassed about it, but at this time, this was a great assembly," he joked lightheartedly.
Having assembled genomic data for these fish relatives, Burgess proceeded with discussing the main question for the evolutionary biology of zebrafish-a species he considers an anomaly due to its unusually large amount of genomic data.
"This is where we get to the meat of the evolution question. Now that we've got grass carp, gold fish, carp and zebrafish, you can use the zebrafish as sort of the base group," he added. "You can ask at every branching, how many genes are lost and how many enhancers are lost for each gene [...] so we can track essentially all of [this] neo- and subfunctionalization across the entire genome."
Gene loss occurred significantly higher in goldfish and in carps due to duplication events, resulting in loss of certain functionalities. This led Burgess into the question of how often the total expression of any given gene looks the same in both copies of the gene. According to Burgess, there is a hotspot where the gene looks extraordinarily different, which his team believes leads to evolution. Essentially, differences in gene expression between "ohnologs" (a gene that originates from a whole-gene duplication event) can be used to identify evolutionary divergence. Genes being expressed where they're not normally expressed in tissue-specific ways allows for neofunctionalization.
"The more different the cDNA (sequences) become-the more evolutionarily changed the two copies have become-the less likely they're expressed the same. This is where we think is capturing the neofunctionalization event," he said.
Pivoting to epaulette sharks, Burgess was interested in the reason behind the extremely low occurrence of cancers in sharks. Existing data on the epaulette shark genome is relatively complete, with an assembly size of 3.9Gb, 54 chromosomes, and nearly 2,000 scaffolds. To detect the de novo mutation rate in this species, Burgess' team sequenced one offspring as well as its siblings to identify snips of genes unique to one offspring, allowing for calculating the mutation rate per base.
"The epaulette shark genome is large with many chromosomes, some of them microchromosomes. De novo mutations in epaulette sharks are surprisingly low, with ramifications in both cancer rates and ecological bottlenecks," Burgess said.
Finally, Burgess spoke on his current work: updating the genome of zebrafish telomere-to-telomere. According to Burgess, the past zebrafish genome set contains roughly 2,000 missing genomes and a significantly smaller sequence than expected, elucidating a critical gap in the genome of a common model organism. The most updated set is significantly improved but isn't telomere-to-telomere assembly. Burgess' team successfully bred homozygous fish, cultured fibroblasts and isolated DNA from quality tissue that can undergo sequencing.
Comparing Burgess' data and existing data, Burgess was able to refine the genome and fill in the gaps that past work wasn't able to cover. Another approach to the same problem revealed that fish from cross-lab strains' snips are as varied as humans are from chimpanzees. There is enough variation in all of the strains that Burgess' team has millions of distinct snips that will allow them to split the haplotypes (a group of DNA sequences located on the same chromosome that are passed down together through inheritance), tell when sequences came from different fish, map different regulatory variations and determine what's driving the differences in these fish.
The team's telomere-to-telomere assemblies improve accuracy and add novel genes and sequences to reference genomes. Burgess concluded the talk by stressing the importance of capturing the true variation in gene expression and genomic differences through pangenomes, and emphasized that geneticists should aim to provide a comprehensive collection of all the genetic variations that exist in an organism.
"You can see profound differences in which genes are being expressed and by how much across essentially the whole chromosome. Amazingly, they all still look like zebrafish on the other hand. We have enormous differences in gene expression, and yet, the output in the end is ultimately the same," he shared.