When doing anything even vaguely related to quantitative genetics I would chose more missing data over more genotyping errors any day of the week. There are lots of approaches to making missing data less of a pain. The most straightforward of these is called imputation. Imputation essentially means using the genetic markers where you do have information to guess what the most likely genotypes would be at the markers where you don’t have any direct information on what the genotype is. This is possible because of a phenomenon known as linkage disequilibrium or “LD.” Both imputation and LD deserve their own entire write ups and they are on the list of potential topics for when I have another slow Sunday afternoon. For now the only thing you have to know about them is that, when information on a specific genetic marker is missing, it is often possible to guess with fairly high accuracy what that missing information SHOULD be. But when the information on a specific genetic marker is WRONG… well it’s usually a bit more of a mess (but I think the software solutions for this are getting better! Details at the end of the post.)
Thinking about defining the number of genes present in the maize genome reminded me of an old* story about the trouble of defining what truly represents a gene and how really awesome ideas can sometimes come years before the data needed to support them.
The year is 2002. The first complete version of the human genome is still a year away. The genomes of two plant species have already been published (rice and arabidopsis) but in terms of shere genome size, both species are a drop in the bucket compared to the human genome, or other plant genomes like corn or wheat. But none of this is particularly important except to set the stage.
Two researchers at Rutgers University were sequencing a tiny piece of the maize genome (~0.01%) that surrounded a single gene call bronze1 — the fifth most studied gene in maize — when they found something unexpected.
They had previously 10 identified genes in a single stretch of 32-kb of the maize genome. (A similar gene density throughout the remainder of the maize genome would have resulted in a maize genome containing more than 700,000 genes!) However it was already known that the maize genome was split between small gene-rich islands and vast desolate expanses of transposons (referred to as transposon nests**), and in fact the same study identified a couple of these nests of transposons on either side of their gene rich island (see part A of the second picture in this post).
Their initial sequencing used DNA from a breed of corn called McC, which I must admit I’ve only ever read about in this particular paper. However, when they decided to sequenced the same region from the genome of B73*** they made three discoveries which I’ve listed in increasing order of strangeness: (more…)
I’m just about wrapped up with the big project I’ve been working on recently. Hope to be able to say more about it in the not-too-distant future. Having to be secretive in science sucks.
But there’s a lot of be happy about! I’m done teaching for a long time. As much as I enjoyed working with the kids in my class, the other responsibilities of teaching (grading, sitting through lectures without the chance to break in for the discussions and arguments that make academia so fun, grading, designing assignments, grading) were really starting to wear me down.
And I’m only three weeks (June 22nd) from either passing my qualifying exam or becoming a beaten and broken shell of a man. For three hours four professors will question me on everything I’ve learned (or should have learned but didn’t) in my education up to this point, and everything I propose to spend the next few years of my life doing. This may not sound like a good thing, but it is. Because my qualifying exam has been hanging over my head all semester,
The lab has a new paper in press, having run the sequential gauntlets of Peer Review, Editorial Evaluation, and finally (and perhaps most dreaded) Your-Figures-Aren’t-High-Resolution-Enough e-mails from the journal’s publication department. But more on the details of that whenever the paper actually shows up.
But what was the point of this entry again? Oh yeah. Transposons. I have a soft spot from transposons (I’m guessing most people who work with maize genetics do). Today we may know that transposons are found in practically every genome under the sun, but they were discovered first in maize using old school genetics (breeding plants together and counting traits in the offspring), before DNA sequencing was a gleam in its inventor’s eye.
And on top of that, some delightfully high-copy number transposons are in the middle of proving a major scientific point for me, so I figured the least I could do was devote a week to them here on the site.
If you’re not a geneticist, should you still care about transposons? Absolutely! Transposons are one of the best arguments, not for why genetic engineering is safe, but for why, if anyone worried about hypothetical unintended consequences of genetic engineering should be worried about any food with DNA in it (and as far as I know, that’s all food.) To paraphrase a seinfield character: “No food for you!”
The week’s schedule: (more…)
Superman had the yellow sun of earth, spiderman had a radioactive spider-bite, but what about superweeds, where does their super power (surviving application of Round-up/glyphosate) come from?
To understand how superweeds survive, we first have to understand why normal weeds (the Jimmy Olsens and Lois Lanes of the plant world) die. <– last superhero reference of this post I promise. (more…)
There are more differences in the genomes of two unrelated corn plants than between the genomes of a human and a chimpanzee (two species separated by 3.5 million years of evolution).
On the other hand, two unrelated human beings, members of the same species, have more than four times as many genetic differences as two unrelated heirloom tomatoes.
Corn vs. Corn > Human vs. Chimpanzee >> Human vs. Human >> Heirloom Tomato vs. Heirloom Tomato
Now the fact that any two human beings are more closely related to each other than either is to a chimpanzee should be obvious to anyone who gives it a moments thought.
I plan to poll my sections tomorrow to see how many of them would put corn and heirloom tomatoes in the opposite positions, but many have figured out my feelings about corn, so they’ll probably guess it’s a trap.
One of the things that has made annotating genes in the maize genome so difficult (there are currently two sets of gene models one with only 32,000 genes, which is low estimate, and the other with 100,000 is far too many) is the presence of large numbers of gene fragments that have been captured and duplicated by a class of transposon called helitrons (yes I know that sounds like a character from Transformers).
The helitron captured fragments are copied from real genes (often multiple pieces are captured from different genes) which is why many gene annotation programs (trained to recongize the difference between genes and non-coding DNA) will identify the fragments being genes themselves.
What if some of those fragments actually are genes? By combining pieces from completely different genes, helitrons could be a whole new source of crazy new genes that natural selection could act upon.
That is the question the authors of this poster are trying to get at, by identifying more helitron fragments and checking to see if those fragments were actually expressed in the genome.
Allison Barbaglia et al. “Accessing the transcriptional activity of Helitron-captured genes of maize” Poster #243 2010 Maize Meeting
Edit: stripped out all the numbers as they clearly applied to an earlier version of the data and I don’t know if the new ones are intended for public release yet.
Last november when the maize genome was published, one of the companion papers looked at genes where a different number of copies were found in different breds of maize (this is called Copy Number Variation) and genes found in B73 (the variety of maize that was sequenced) but completely missing from the genomes of other varietes. There’s a great post on that paper written up by Mary at OpenHelix.
A few months later, it sounds like this dataset has grown substantially. Over XXXX B73 genes (that’s X% of the filtered B73 gene set!) that appear to be lost (or have sequences so different they no longer register) in at least some varities of maize. And because the new dataset incorporates data from XX different maize breds and XX different teosinte* lines they’re able to identify some of the losses as older because they’re found in multiple comparisons, while some appear to be lost in only a single breed, and might represent more recent losses.
As you can imagine I’d love to get my hands on this dataset myself, but the next best thing will be to take furious notes when Nathan Springer talks about the project on Friday morning**, and being sure to swing by Steven Eichten’s poster soak in the awesomeness.
Ruth A. Swanson-Wagner et al. “Combined Analysis of genomic structural variation and gene expression variation between maize and teosinte populations” Talk #1 2010 Maize Meeting (Presented by Nathan Spinger)
Steven R. Eichten et al. “Extenisve Copy Number Variation Among Maize Lines” Poster #139 2010 Maize Meeting
*Teosinte is the wild species from which maize/corn was domesticated.
**And he’s talking at 8:30 AM on a day when I still plan on being heavily jet lagged.
There is a piece of DNA that is sometimes found on the end of the tenth maize chromosome. In plants that possess this extra chromosome segment, chromosome knobs* (including one that’s a part of the extra segment included in abnormal chromosome 10) start to act like centromeres**. But this story graduates from odd to downright weird when I tell you that possessing this extra centromere-like activity gives a chromosome an unfair advantage in being passed on to the next generation.
Plants, like animals, possess two complete genome copies, one from each parent. They’ll only pass on one copy (mixtures of pieces from each parent) to their offspring. Any given sequence has a 50% chance of being passed on which seems fair given the plant is passing on 50% of its total genetic material. But abnormal chromosome ten cheats (using those extra centromere-like sequences I mentioned earlier). It has up to an 83% chance of being passed on.
Since the breed of corn (B73) the maize genome was based on has the normal version of chromosome 10, we know very little about the extra DNA found in abnormal chromosome 10. The authors of this poster are going to correct that oversight, by sequencing the region, figuring out how (and how long ago) abnormal chromosome 10 came into being, and hopefully identifying the genes within the region that make chromosome-knobs act like centromeres.
**Centromeres are the part of the chromosomes that bind together during cell division (the center of the X in the traditional drawing of a chromosome). They’re also the place where the molecular machinery that pulls chromosomes apart at the end of the process of cell division.
Lisa Kanizay and Kelly R. Dawe “Uncovering the sequence and structure of maize abnormal chromosome 10” Poster #165 2010 Maize Meeting.
Who could have predicted maize geneticists would be so interested in maize genes? The entry I posted last night on Purple plant1 and Colored aleurone1 easily received more traffic in its first day on the site (it’s still got a long way to go before it catches long term readership attractors like water chestnuts and the NIPGR tomatoes), than any entry since the heady days of the maize genome release back in November.
And this morning the dataset I drew that example from, 464 classical maize genes mapped onto the maize genome assembly plus syntenic orthologs in up to four grass species: sorghum, rice, brachypodium, and the other region of the maize genome created by the maize whole genome duplication (technically syntenic homeologs since we started in maize to begin with, by the principle is the same), went out to the maize genetics community (thank you MaizeGDB!).
A postdoc in our lab tells me more people have visited CoGe today than any day on record (and we hit that mark before noon!).
Anyway, thank you guys, it’s great to feel appreciated!
Colored aleurone1 and Purple plant1 are both genes with long histories in maize research and are involved in the regulation of anthocyanin biosynthesis.The mutant version of purple plant1 does exactly what it sounds like. (In the proper genetic background) it has plants producing anthocyanin (a purple plant pigment) everywhere, resulting in purple plants. The mutant form of colored aleurone1 was identified from a mutant that changed the color of individual corn kernels. Guess which of these two classic maize mutants made it into the top 15 most published on genes in maize, and which fell barely short.
The two genes are also duplicates (homeologs) resulting from the maize whole genome duplication. From the picture below you can also see both the two genes and the regions they are in match up to single regions in rice and sorghum, two grasses that haven’t gone though a whole genome duplication since the great radiation of grass species that took place an estimated 50 million years ago (well after dinosaurs stopped walking the earth). (more…)