This past summer a technician in the lab rediscovered our carefully guarded stash of FFMM seeds and we decided it was time to increase them. While we did most of the increase in the greenhouse, the idea came up at the same time we were finalizing the plans for our summer nursery* so we decided to plant the line in the field as well.
With all these new third generation sequencing technologies coming out in 2010, hopefully someone will sequence the pineapple genome. If not, maybe the cost of sequencing will drop enough while I’m in grad school that I can sequence the genome myself ( a guy can dream).
Although I was a bit overly optimistic back in 2010 about how fast the cost of sequencing (and critically assembling) genomes would decline. Back then we are all talking about sequencing prices dropping 10x every 1-2 years. This turned out of be a quick burst of innovation brought about by second generation sequencing technologies (primarily 454 at first, then Solexa which became Illumina later on). Like many technologies, there was a lot more low hanging fruit for optimization early on, and the cost of sequencing essentially plateaued from 2011 to 2015.
Of course now we’re finally starting to get those economically viable 3rd generation sequencing technologies I though were right around the corner in 2010. And they still have lots and lots of headspace for optimization (pacbio and oxford nanopore being the two most successful ones at the moment) that maybe in another 6-7 years grad students really will be able to generate genome assemblies on a whim.
In the meantime, hey, we did get a pretty cool pineapple genome assembly a couple of years ago.
Editor’s note: Robert VanBuren, second author on the pineapple genome, and first author on at least one of dozen or so published grass genome sequences got his own research group out at MSU working on CAM photosynthesis and drought. Check it out!
Out of the ~12,000 known grass species, the genomes of less than one in one thousand have been sequenced. The “One in a Thousand” series focuses on these rare grass species.
Dichanthelium oligosanthes is a wild grass that grows in forest glades throughout the American midwest. It is a small plant. Doesn’t grow particularly fast. Its flowers aren’t particularly striking. And it has enough issues with seed dormancy that growing it in captivity is a major pain. Dichanthelium is a one in one-thousand grass with a sequenced reference genome.*
The reason folks are interested in Dichanthelium isn’t because of what it is, but who it’s related to. Dichanthelium occupies a spot on the grass family tree between a tribe** of grasses that includes foxtail millet and switchgrass, each one in a thousand species themselves, and another tribe of grasses that includes corn and sorghum, two more one in a thousand species. The relationship looks something like this:
Functionless DNA changes more rapidly, functional DNA more slowly. This is one of the fundamental principles of comparative genomics. It’s why people look at the ratio of synonymous nucleotide changes to nonsynonymous nucleotide changes within the coding sequence of genes. It’s why the exons of two related genes will still have strikingly similar sequences after the sequence of the introns have diverged to the point where it’s impossible to even detect homology. It’s also a way to identify which parts of the noncoding sequence surrounding a set of exons are functionally constrained. The bits of noncoding sequence that determine where, and when, and how much, a gene is expressed are by definition, functional, and should diverge more slowly between even related species than the big soup of functionless noncoding sequence that the functional bits of a genome float in. These conserved, functional, noncoding sequences are called, unimaginatively, conserved noncoding sequences (CNS).*
I’ve been playing with CNS since I first opened a command line window back as a first year grad student. The smallest CNS we’d consider “real” were 15 base pair exact matches between the same gene in two species. On the one hand, this seemed a bit too big, because I know lots of transcription factors bound to motifs as short as 6-10 base pairs long. On the other hand this seemed a bit too short because I’d see 15 base pair exact matches that couldn’t be real a bit too often (for example a match between a sequence in the intron of one gene, and the sequence after the 3′ UTR of another).
15 bp represented a compromise between the two concerns pushing in opposite directions. Then, in the fall of 2014, a computer science PhD student walked into my office and asked if I had any interesting bioinformatics problems he could work on. The result was a new algorithm (STAG-CNS) which was both more stringent at identifying conserved noncoding sequences and able identify shorter conserved sequences than was previously possible. It achieved both of these goals through the expedient of throwing genomes from more and more species at the problem.
There are more differences in the genomes of two unrelated corn plants than between the genomes of a human and a chimpanzee (two species separated by 3.5 million years of evolution).
On the other hand, two unrelated human beings, members of the same species, have more than four times as many genetic differences as two unrelated heirloom tomatoes.
Corn vs. Corn > Human vs. Chimpanzee >> Human vs. Human >> Heirloom Tomato vs. Heirloom Tomato
Now the fact that any two human beings are more closely related to each other than either is to a chimpanzee should be obvious to anyone who gives it a moments thought.
I plan to poll my sections tomorrow to see how many of them would put corn and heirloom tomatoes in the opposite positions, but many have figured out my feelings about corn, so they’ll probably guess it’s a trap.
When I was an undergraduate, there were exactly two sequenced plant genomes, rice and arabidopsis. And sure maybe I didn’t have to walk “ten miles to school, barefoot, in the snow, uphill, both ways”* the one way I did have to walk uphill (sometimes in the snow but always with shoes), was very uphill. But where was I?
Oh yeah, plant genome sequences. Kids getting into plant genomics these days don’t realize how easy they’ve got it. By my count (which may be low but I’m getting to that) there are ten published plant genomes, with several more unpublished genomes that are available in various states of completion, and lots more on the way.
Colored aleurone1 and Purple plant1 are both genes with long histories in maize research and are involved in the regulation of anthocyanin biosynthesis.The mutant version of purple plant1 does exactly what it sounds like. (In the proper genetic background) it has plants producing anthocyanin (a purple plant pigment) everywhere, resulting in purple plants. The mutant form of colored aleurone1 was identified from a mutant that changed the color of individual corn kernels. Guess which of these two classic maize mutants made it into the top 15 most published on genes in maize, and which fell barely short.
The two genes are also duplicates (homeologs) resulting from the maize whole genome duplication. From the picture below you can also see both the two genes and the regions they are in match up to single regions in rice and sorghum, two grasses that haven’t gone though a whole genome duplication since the great radiation of grass species that took place an estimated 50 million years ago (well after dinosaurs stopped walking the earth).
I mention all this to explain why I was so excited to learn that her post this week sings the praises of a group of species near and dear to my heart, the grasses. The whole post is definitely worth a read. Even if you don’t learn something you didn’t already know, read it as a source of inspiration for telling OTHER people how cool grasses are. And the closing is truly excellent:
We usually talk of our domestication of grasses, and the ways in which we have evolved them: we have made plants with bigger, more nutritious seeds that don’t fall to the ground, for example.But their effect on us has been far more profound. Our domestication of grasses, 10,000 years ago or so, allowed the building of the first cities, and marks the start of civilization as we know it. Grasses thus enabled the flowering of a new kind of evolution, a kind not seen before in the history of life: the evolution of human culture.
Some of the comments are heart warming to read as well, although a bunch of people have fallen prey to the maize/corn confusion. (Explained in detail here)
*Speaking of cool science that most of the general public doesn’t know about: We’ve known for more than four years that mutations of the gene talpid2 in chickens cause chicken embyros to develop teeth, something we thought birds had lost the ability to do 60-80 million years ago (around the same time grass was bursting onto the world stage.) Don’t worry too much about getting bitten by a sabertoothed turkey, the toothed embryos have other problems that mean they don’t survive.
**There’s also a three-part video series based on the book that I can best describe as … odd.
Unique citations determined from papered linked to from MaizeGDB gene locus pages. Images of c1 and y1 segregating years by Gerald Neuffer and made available through MaizeGDB.
* = tied for number of citations
** = some mutant alleles have kernel phenotypes.
If you want to become one of the famous mutant corn genes, it helps if you have an effect that is visible in corn kernels instead of only from fully grown plants.
And here is why:
A geneticist could determine that the version of c1 that creates yellow kernels is recessive to the version that creates purple kernels just from looking at the ear of corn on left.
Furthermore, they could tell you that both the male parent (the plant that provided the pollen) and the female parent (the plant on which the ear of corn grew) were both heteryzygous for the c1 genes (they each had one dominant version of the genes and one recessive version), and therefore the corn kernels the parent plants were grown from were both purple.
They would know with certainty that all of the yellow kernels contain two recessive versions of the c1 gene.
While they couldn’t predict with absolute certainty whether a specific purple corn kernel on that ear carried two dominant versions of the c1 gene or one dominant and one recessive version, they would know there was a 1/3 chance that kernel has two dominant copies, and a 2/3 chance it had one dominant and one recessive copy.
That geneticist could make all sorts of predictions about what ears would look like in future generations depending on what colors of corn kernels were planted and which plants were mated with each other.
All this from a single picture of an ear of corn. For a phenotype seen in corn plants but not in kernels (like Knotted1), a geneticist would have to plant a row or more of corn seeds from an ear and examine the growing plants to get the same quantity of information.
And that is why mutations with kernel phenotypes have been so popular over a century of maize genetics research.