September 15th, 2010:

The Cacao Genome

Cacao seed pood. Source: H. Zell, wikimedia (click to see photo in its original context)

Today’s addition to the list of available plant genome sequences has been on my mental list of missing in action genome projects for quite some time: Theobroma cacao, the tree whose seeds provide the world with chocolate. The sequencing of the cacao genome was announced with great fanfare back in 2008. I distinctly remember discussing the announcement with a several grad students while on a tour to see Iowa State’s recently acquired Illumina sequencer. (That’s right I hadn’t even started grad school out here on the coast when this genome project was announced.)

The Genome:

The cocao tree has ten chromosomes. The genome project was able to assemble a little more than 92% of the genome into pseudomolecules representing those ten chromosomes.* The remainder of the genome was assembled into smaller pieces of sequence that have not yet been accurately assigned to a chromosome. The genome already has a preliminary set of genes annotated onto it (~35,000 gene models). The entire genome is estimated to be ~400 megabases, which is quite reasonable sized as genomes go and the press release mentions that the genome has been sequenced to 200-fold coverage (can this be right, it seems absurdly high?) using a mixture of 454 and Illumina sequencing. At the opposite extreme from technical stats like that, David Kuhn, one of the people involved in the genome project was quoted in the Washington Post describing it as “a very well-behaved genome” and I’m intrigued to find out what he means by that.

The genome is already released, and can be downloaded from this website. The appear to be no Fort Lauderdale restrictions, which means people can begin asking questions of the genome and publishing the answers they discover today!

