Edit: If you found this post searching for a link to download the apple genome, check out the more recent entry:
I’ve had a chance to read over the apple genome paper. The genome was sequenced to 16.9 fold coverage* using a mix of Sanger (old school sequencing, if assembling a genome is like putting together a puzzle, Sanger uses the biggest puzzle pieces) and 454 (one of the oldest “2nd generation” technologies, it’s the most expensive per million base pairs sequenced, but gives the longest reads (largest puzzle pieces) of any technology other than Sanger).
The group sequencing the apple genome was able to put together ~600 megabases of sequence spread over the 17 chromosomes of apple The remaining 20% of the genome (the total genome size is estimated at to be 742.3 megabases) is a mixture of smaller assembled pieces that couldn’t be assigned to specific chromosomes, and repetitive sequences (either simple sequences or transposons where there are so many duplicate copies it’s impossible to say which sequence data belongs with which transposon copy).
They were able to identify a whole genome duplication (which they call a genome wide duplication) that occurred in the ancestor of apples 30-65 million years ago. What’s weird is that apple has experienced only one chromosome fusion (starting with an ancestral chromosome set of 9 chromosomes, doubling to 18, and then fusing two chromosomes into one, resulting in the 17 chromosomes seen in apple today) and 3 major rearrangements in all those millions of years. By comparison, my favorite species maize (corn) has reduced it’s chromosome number from 20 to 10 in only 5-12 million years since its most recent whole genome duplication.
I still can’t figure out where to download the genomic sequence data for apple (although I was able to dig up a genome browser here), but now that I’ve had a chance to look over the paper describing the genome, I’m officially adding Apple to our list of sequenced plant genomes.
As always, my thanks and appreciate go out to all the people involved in the sequencing, assembly, annotation, and analysis of this new genome.
If you’re interested, the paper itself can be found at Nature Genetics:
- Riccardo Velasco et al., “The genome of the domesticated apple (Malus [times] domestica Borkh.),” Nat Genet advance online publication (online 2010), http://dx.doi.org/10.1038/ng.654.
Random Apple Triva:
- Apples are self incompatible. Which means the flowers of one apple tree can only be fertilized by apples from a different variety. For apple growers that means growing different varieties of apples next to each other. For breeders it means working with apple’s is much more complicated, since selfing (pollinating the flower of a plant with pollen from the same plant or even the same flower) is normally a powerful and useful technique for plant breeding.
- Many successful apple varieties are actually triploids (they have three copies of each chromosome instead of the usual two) and don’t produce fertile pollen. Growing these varieties is even more complicated because they need to be grown with at least two different diploid varieties of apple, so the diploid pollen can stimulate the triploid apple tree to produce fruit, and so each diploid kind of apple tree has a fertile unrelated variety of apple tree to pollinate its own flowers
- Apples are the second most consumed fruit in the US (after bananas) at 16.4 pounds per person per year, although watermelons are sneaking up on second place.
- Developing a new apple variety can easily take 15-20 years (although hopefully the release of this genome will make it possible to speed up the process with marker assisted breeding).
- If you develop a new variety of apple, you can patent it (example). I only bring this up is because I run into so many people who think genetic engineering is the only way crop plants get patented.
*The people who sequenced the apple genome generated almost 17 times as much sequence data as is contained within the genome. Which is important for two reasons. DNA sequencing has a significant error rate, and determining the identity of each individual A, T, C or G multiple times lets us catch most of the mistakes that are made, and since the DNA fragments that are sequenced are a random sample of the genome, only sequencing 1 or 2 times as much DNA as the genome contains would mean many parts were sequenced three or four (or more times), and other parts weren’t sequenced at all, creating gaps where interesting genes could escape detection.