I’m now worked at four different scientific institutions in some capacity or another, and I’m always surprised how empty buildings are when I come in on Saturdays or Sundays. To be clear, I’m certainly not at work every weekend day myself, and I don’t expect the students or collaborators to work weekends.* I’m just realizing that, after 13 years of thinking “wow, people at University X really have a more relaxed approach to research than most places” maybe my idea of how many hours it is normal for a researcher to log in a week might be a tiny bit skewed.**
Although, to be fair, 9:15 AM on Easter Sunday might be the MOST representative time point. 😉
*I always say that my mentoring style is to focus on productivity, not hours worked in lab. I’m still working out what that means in practice. For an entertaining — as long as the person writing the e-mail isn’t your boss — glimpse of what the opposite sounds like, be sure to read this classical e-mail from 2002.
**Growing up, I thought every family had dinner around 8 pm once everyone got home from the office, and that once you got a real job, “weekend” actually meant “sunday morning.”
With all these new third generation sequencing technologies coming out in 2010, hopefully someone will sequence the pineapple genome. If not, maybe the cost of sequencing will drop enough while I’m in grad school that I can sequence the genome myself ( a guy can dream).
An incredibly overused graph, but the reason it’s so overused is that it really is a remarkably useful dataset. Source: https://www.genome.gov/sequencingcostsdata/
Although I was a bit overly optimistic back in 2010 about how fast the cost of sequencing (and critically assembling) genomes would decline. Back then we are all talking about sequencing prices dropping 10x every 1-2 years. This turned out of be a quick burst of innovation brought about by second generation sequencing technologies (primarily 454 at first, then Solexa which became Illumina later on). Like many technologies, there was a lot more low hanging fruit for optimization early on, and the cost of sequencing essentially plateaued from 2011 to 2015.
Of course now we’re finally starting to get those economically viable 3rd generation sequencing technologies I though were right around the corner in 2010. And they still have lots and lots of headspace for optimization (pacbio and oxford nanopore being the two most successful ones at the moment) that maybe in another 6-7 years grad students really will be able to generate genome assemblies on a whim.
In the meantime, hey, we did get a pretty cool pineapple genome assembly a couple of years ago.
Ming R., VanBuren R., Wai C. M., Tang H., Schatz M. C., et al.
, 2015 The pineapple genome and the evolution of CAM photosynthesis.
Nat Genet 47: 1435–1442.
Also, here’s a fun video of a 3D scan of the internal structure of a pineapple:
Evidence of my ongoing obsession with pineapples.
Science is fun.
Editor’s note: Robert VanBuren, second author on the pineapple genome, and first author on at least one of dozen or so published grass genome sequences got his own research group out at MSU working on CAM photosynthesis and drought. Check it out!
Inflorescence of Dichanthelium oligosanthes. Accession “Kellogg 1175”
Out of the ~12,000 known grass species, the genomes of less than one in one thousand have been sequenced. The “One in a Thousand” series focuses on these rare grass species.
Dichanthelium oligosanthes is a wild grass that grows in forest glades throughout the American midwest. It is a small plant. Doesn’t grow particularly fast. Its flowers aren’t particularly striking. And it has enough issues with seed dormancy that growing it in captivity is a major pain. Dichanthelium is a one in one-thousand grass with a sequenced reference genome.*
The reason folks are interested in Dichanthelium isn’t because of what it is, but who it’s related to. Dichanthelium occupies a spot on the grass family tree between a tribe** of grasses that includes foxtail millet and switchgrass, each one in a thousand species themselves, and another tribe of grasses that includes corn and sorghum, two more one in a thousand species. The relationship looks something like this:
Phylogenetic relationship of Dichanthelium oligosanthes to related grasses with sequenced genomes.
Functionless DNA changes more rapidly, functional DNA more slowly. This is one of the fundamental principles of comparative genomics. It’s why people look at the ratio of synonymous nucleotide changes to nonsynonymous nucleotide changes within the coding sequence of genes. It’s why the exons of two related genes will still have strikingly similar sequences after the sequence of the introns have diverged to the point where it’s impossible to even detect homology. It’s also a way to identify which parts of the noncoding sequence surrounding a set of exons are functionally constrained. The bits of noncoding sequence that determine where, and when, and how much, a gene is expressed are by definition, functional, and should diverge more slowly between even related species than the big soup of functionless noncoding sequence that the functional bits of a genome float in. These conserved, functional, noncoding sequences are called, unimaginatively, conserved noncoding sequences (CNS).*
Comparison of a single syntenic orthologous gene pair in the genomes of peach and chocolate. Coding sequence marked in yellow, introns in gray, annotated UTRs in blue. Red boxes are regions of detectably similar sequence between the same genomic region in these two species. Taken from CoGePedia.
I’ve been playing with CNS since I first opened a command line window back as a first year grad student. The smallest CNS we’d consider “real” were 15 base pair exact matches between the same gene in two species. On the one hand, this seemed a bit too big, because I know lots of transcription factors bound to motifs as short as 6-10 base pairs long. On the other hand this seemed a bit too short because I’d see 15 base pair exact matches that couldn’t be real a bit too often (for example a match between a sequence in the intron of one gene, and the sequence after the 3′ UTR of another).
15 bp represented a compromise between the two concerns pushing in opposite directions. Then, in the fall of 2014, a computer science PhD student walked into my office and asked if I had any interesting bioinformatics problems he could work on. The result was a new algorithm (STAG-CNS) which was both more stringent at identifying conserved noncoding sequences and able identify shorter conserved sequences than was previously possible. It achieved both of these goals through the expedient of throwing genomes from more and more species at the problem.
When doing anything even vaguely related to quantitative genetics I would chose more missing data over more genotyping errors any day of the week. There are lots of approaches to making missing data less of a pain. The most straightforward of these is called imputation. Imputation essentially means using the genetic markers where you do have information to guess what the most likely genotypes would be at the markers where you don’t have any direct information on what the genotype is. This is possible because of a phenomenon known as linkage disequilibrium or “LD.” Both imputation and LD deserve their own entire write ups and they are on the list of potential topics for when I have another slow Sunday afternoon. For now the only thing you have to know about them is that, when information on a specific genetic marker is missing, it is often possible to guess with fairly high accuracy what that missing information SHOULD be. But when the information on a specific genetic marker is WRONG… well it’s usually a bit more of a mess (but I think the software solutions for this are getting better! Details at the end of the post.)
Figure 1: Genotype calls along chromosome 1 for six recombinant inbred lines (RILs).
Last night my major professor received the McClintock Prize in Maize Genetics.
His acceptance talk was really exciting and full of his newest ideas about the big problems of biology and evolution. However, looking back at his history, one of the amazing things about his career is that he’s reinvented himself entirely, switching from a research program focused on transposons and developmental biology to an entirely different career focused on taking the rigorous hypothesis development and hypothesis testing to the world of comparative plant genomics (and he started when there was exactly one sequenced plant genome, so being able to do comparative work at the time was quite something).
In many ways it makes me nostalgic for my time in the lab. In grad school you are essentially paid to think, while it often feels like as a faculty member you are paid mostly to attend meetings, fill out forms, and spend four hours a day answering e-mails. 😉
But this post isn’t about me. Congratulations Mike! Really is one of the fathers of modern maize genetics.
A partial sample of the 76 people who either received their PhDs or Postdoc’d in the Freeling Lab at UC Berkeley
Complete list of lab alumni here.
Corn is a weird plant in a lot of ways, but one we don’t think about very much (because it is so obvious) is that a corn plant has entirely separate male and female reproductive structures: tassels and ears respectively.* This isn’t unheard of in the plant kingdom, but in the particular group of grasses corn belongs to (the Andropogoneae) it’s quite remarkable. Tripsacums, the closest relatives of corn outside of corn’s own genus (Zea), have separate male and female flowers, but those flowers still share a common reproductive structure with the male flowers at the tip and the female flowers at the base. I’d like to have a photo of my own to show you, but I won’t until the Tripsacum plants growing in our greenhouse flower this summer, so in the meantime, go look at this great photo someone else took.
But I bring this up to point out that the segregation of male and female flowers into entirely different parts of the corn plant is still a relatively recent, and fragile, evolutionary development, and it doesn’t take a lot to disrupt it. There’s a series of tasselseed mutants.** Stresses can do it. Various infections can do it. And sometimes corn plants, particularly tillers, just decide to be confusing.
A maize (corn) tiller exhibiting the tasselseed phenotype which is often found in these secondary stalks
*And no, don’t call sorghum heads (or panicles, it depends on how formal you feel like being) tassels.
**Why aren’t there just as many anther ear mutants? It could have to do with the way corn flowers are wired. If female floral organs start developing, they actually cause the male floral organs to die prematurely. But anther ear phenotypes still happen.***
***QTL Controlling Masculinization of Ear Tips in a Maize (Zea mays L.) Intraspecific Cross
This will be on each poster from my group at the maize meeting
The maize genetics cooperation newsletter (MNL) dates all the way back to 1929. It was (and is) a way for members of the maize community to share interesting findings and preliminary data with their colleagues. Some of those results would ultimately turn into peer reviewed papers (a process that could take months or years) and others were just little weird pieces of data or observations which would otherwise have been lost as negative or ambiguous results. Here’s a good example of what a MNL note might look like.
That the maize genetics community has made the decision to be trusting and open with our hard earned data and analysis for almost 90 years, with nothing preventing others from taking advantage of this openness other than community norms, is a great example of the better angels of our collective nature. It’s a standard I drive myself to live up to.*
*Keeping in mind I probably don’t even qualify as a geneticist, let alone a maize geneticist.** But I am descended from maize geneticists, both genetically and academically.
**One of these days I really hope to clone my very own mutant.
My favorite figure.
The photo really says it all. In the first second your eye is immediately drawn to just how similar the two plants look. In the second, you start to wonder about the differences between the two (the sorghum plant is way more waxy, the corn plant has a purple auricle from anthocyanins).
I want to understand the conserved genomic features that maize corn and sorghum so similar, and the subtle genetic changes that make them so different.