James and the Giant Corn Genetics: Studying the Source Code of Nature

October 19, 2010

So you want to download the apple genome

Filed under: Uncategorized — James @ 3:35 pm

Well depending on what sort of dataset you need for your research today is your lucky day (actually late last week was your lucky day but I’m buried in work and way behind).

Download links for lots of apple genome related data are now available from GDR: The Genome Database for Rosaceae.

One file you’ll be surprised not to find a a link to download the pseudomolecules. (For those reading this who don’t have to work with genomic data, pseudomolecules are continuous DNA sequences of entire chromosomes. Each chromosome is technically a single, incredibly complex, molecule, not unlike water or carbon dioxide.)  From GDR:

Because of the heterozygous nature of the Malus x domestica genome, pseudomolecules corresponding to the chromosomes are not presented.   To preserve halplotype information, this assembly consists of sets of assembled contigs organized into metacontigs (clusters) and anchored to the chromosomes using 1,643 genetic markers.  There is no consensus sequence from assembled contigs.  Please see the whole genome publication in Nature Genetics for further details.

Now there is a GFF file that lists the locations and orientations of all the genes on these (non-existence) pseudomolecules, as well as the locations and orientations of the sequence contigs (smaller pieces of DNA which are available for download). So in theory it should be possible to reassemble the smaller contig sized DNA sequences into pseudomolecules….

…I don’t recommend trying it. I’ve spent a number of hours over the last few days trying to do exactly that and the sad fact of the matter is overlapping contigs have significant differences in their sequences, so there is no simple way to squash them together, even if you don’t care about heterozygousity data. The best I’ve managed was a home made set of pseudomolecules where 24% of the annotated apple genes had correct start and stop codons. (The coding portion of a gene starts with the three letter codon “ATG” and ends with one of three codons “TGA” “TAA” or “TAG”. When more than three-quarters of the genes annotated on the genome aren’t following that rule, it means there is something very wrong with your annotations, your sequence, or both).

So have fun with the apple genome, but don’t expect the lack of pseudomolecules to be easily overcome. <– But if on the other hand you DO manage to overcome it, drop me a line using the new “Contact me” tab at the top of the website, and let me know how you managed it!

Thanks to Aaron for pointing this new sequence data out to me.


  1. Well, thanks for pointing this out…even if it is a completely useless “assembly” for my purposes….
    I’m surprised the heterozygosity would be of such magnitude to make pseudomolecules impossible. It seems odd that apple can propagate itself, but then again I’m a computer guy, not a plant biologist.

    Comment by William Nelson — October 23, 2010 @ 11:40 am

  2. I think the way the apple genome project went about sequencing, assembling and annotating the genome was targeted at capturing heterozygosity. It’d be interesting to go back to the raw sequencing reads and try assembling the genome from scratch while allowing more mismatches to squash together heterozygous regions but that’d be a much bigger project than what I attempted and even if it worked would require re-annotating new gene models afterwards.

    Comment by James — October 23, 2010 @ 2:32 pm

  3. Well on second thought, it’s probably fine for my purposes. I’m just aligning it to the peach genome, so the heterozygosity doesn’t really matter, and their gff file does show where all the contigs go. If it works I’ll post the link.

    Comment by William Nelson — November 23, 2010 @ 1:23 pm

  4. Are you essentially trying to layer the copies of the apple genome from its ancient tetraploidy on top of each other on peach? I’ll be interested to hear how it works out. Good luck!

    Comment by James — November 23, 2010 @ 7:21 pm

  5. Posted, under the “Rosaceae” section of
    http://www.symapdb.org/ .
    It seems to have come out pretty well…

    Comment by William Nelson — November 24, 2010 @ 2:42 pm

  6. Very cool graphics! Do you guys have an introductory video or something I could point people to?

    Comment by James — November 27, 2010 @ 9:24 pm

  7. A video, hm…we’ve talked about that, but never gotten around to making one. I guess a web page user manual is a bit old school these days….

    Comment by William Nelson — November 29, 2010 @ 7:59 am

  8. For figuring out how to actually use a system, nothing beat written instructions. The use I like videos for is showing off how cool and useful a tool is so they’ll take the time to read through the manual and learn how to do the same sorts of things themselves.

    Either way, it’s a really cool tool I hadn’t run across before, thanks for pointing it out!

    Comment by James — November 29, 2010 @ 8:14 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress

%d bloggers like this: