When the maize genome paper came out last November (see the summary of this blog’s maize day coverage) it included information on 32,690 genes within the maize genome. These were the genes which the researchers involved in sequencing the genome were very confident really were genes. And by themselves those 30,000+ genes put the maize genome way ahead of our own. Of course EVERY plant genome ever sequenced has contained more genes than we do, so you’d think by now this wouldn’t be news any more. We’re not the most genetically complex creatures on the planet, and we’ll just have to learn to live with that fact.
But where was I? Oh yeah, gene counts. 32,690 high confidence genes*. Of those, how many have been studied individually?
While I don’t know that anyone knows the precise answer to that question, one indicator is how many maize genes were named before the maize genome was sequenced. People have been naming maize genes since before even the structure of DNA was known, based on the effect mutant version of the gene have on corn plants (for example: waxy1 or yellow stripe1). Later names might be based on the function of the gene (alcohol dehydrogenase1 or superoxide dismutase4), or anything else we know about the gene (wound induced protein1 or male flower specific18). The point being, if someone bothered to name a gene sometime during the last century of maize genetics, it was likely because they were studying it (to a greater or lesser extent). MaizeGDB keeps records of most of the named genes in maize and (excluding chloroplast and mitochondrial genes) I was able to find records of 1181 named genes in maize.
That’s less than 4% of the number of high confidence genes found within the maize genome, and at least a few of the named genes aren’t found within that group (see the first footnote for more details). Why is that number so low?
- Each of the genes that has been studied in any detail probably represents some grad student’s doctoral thesis. While the tools have gotten better, the expectations for what is involved in characterizing a gene have risen too. I don’t have any statistics on how many maize genetics students earn their PhDs every year (and many of them will have worked on other kinds of projects than characterizing some new mutant gene), but it’s certainly not the thousands that would be required to characterize every gene in the genome in a short period of time.
- Perhaps more importantly, the first genes to be studied are the ones with the best mutant phenotypes. To be a good mutant to study, breaking a gene should create something obviously different about the plant (it’s purple, or the tassel produces seeds like an ear instead of pollen, or the plants grow along the ground instead of standing upright), but not be so vital that embryos containing broken versions of the gene don’t develop at all. From a project that to knock out every gene in another plant Arabidopsis thaliana we know that many genes can be broken without any obvious effect on the plants that carry broken copies. That doesn’t mean there won’t be still be interesting things wrong with the plants when they’re studied in more detail, but such mutants were less likely to be identified early on. As for genes mutations are usually lethal, they can be studied (a friend in a lab downstairs is working with just such a mutant) but it certainly adds a whole new layer of difficulty to any research project so the genes better be involved in something interesting enough to justify the extra pain and suffering involved.
Now the situation isn’t nearly as grim as it might sound. Nature re-uses related genes over and over again both between and within species, so any time a researcher studies a new gene in detail, that information doesn’t just inform our knowledge of one particular gene in one particular species. Like a candle in a dark room, the information created by the study of a single gene will illuminate, to a greater or lesser extent, nearby genes (genes that have similar sequences to the gene being studied directly.) So even for a gene that’s never been studied in maize, we can make guesses about its function based on any related genes that have benefited from detailed study (either other genes in maize, in other grasses like barley or rice, other plants like arabidopsis or snapdragon, or even in animals or bacteria). While no geneticist worth their pollenating apron wouldn’t need experimental data before being CERTAIN of a gene’s function, knowing something about the functions of related genes is an excellent starting point.
I just finished some “free time” science looking at the classical genes of maize genetics (which displaced the time I normally spend writing for this site), so expect a couple more posts on related topics later this week.
*The good folks at maizesequence.org also produced a set of all the sequences they thought MIGHT be genes which, in addition to the filtered genes, includes ~70,000 more sequences that might or might not be genes. Many of these potential genes are computationally predicted, by programs that look at the underlying characteristics of the DNA sequence itself (how they work is outside my expertise and above my pay grade), but I can personally vouch for the fact that at least some of those “possible” maize genes are the real thing so the true number of genes contained within the maize genome is at least somewhat greater than the 32,690 reported with high confidence. This fact isn’t in any way a criticism of the people involved in sequencing and annotating the maize genome. The vast majority of the high confidence genes (called the filtered gene set) are real, and most of the other 70,000 genes (those included only in the working gene set (which also includes the genes from the filtered gene set)) are probably figments of a computer program’s imagination. Anywhere they chose to draw the line between the two groups was going to put some genes in the wrong category, and they did everything they could to minimize those miscategorizations.
**This doesn’t mean that the genes don’t have important jobs. You can imagine, for example, that genes involved in a plant’s ability to survive disease, water shortages, cold stress or heat stress all won’t create obvious problems for plants grown in the relatively pampered conditions we biologists try to provide for our research subjects when we aren’t actively studying what happens when we stress plants.