James and the Giant Corn Genetics: Studying the Source Code of Nature

September 15, 2010

The Cacao Genome

Filed under: Uncategorized — James @ 3:42 pm

Cacao seed pood. Source: H. Zell, wikimedia (click to see photo in its original context)

Today’s addition to the list of available plant genome sequences has been on my mental list of missing in action genome projects for quite some time: Theobroma cacao, the tree whose seeds provide the world with chocolate. The sequencing of the cacao genome was announced with great fanfare back in 2008. I distinctly remember discussing the announcement with a several grad students while on a tour to see Iowa State’s recently acquired Illumina sequencer. (That’s right I hadn’t even started grad school out here on the coast when this genome project was announced.)

The Genome:

The cocao tree has ten chromosomes. The genome project was able to assemble a little more than 92% of the genome into pseudomolecules representing those ten chromosomes.* The remainder of the genome was assembled into smaller pieces of sequence that have not yet been accurately assigned to a chromosome. The genome already has a preliminary set of genes annotated onto it (~35,000 gene models). The entire genome is estimated to be ~400 megabases, which is quite reasonable sized as genomes go and the press release mentions that the genome has been sequenced to 200-fold coverage (can this be right, it seems absurdly high?) using a mixture of 454 and Illumina sequencing. At the opposite extreme from technical stats like that, David Kuhn, one of the people involved in the genome project was quoted in the Washington Post describing it as “a very well-behaved genome” and I’m intrigued to find out what he means by that.

The genome is already released, and can be downloaded from this website. The appear to be no Fort Lauderdale restrictions, which means people can begin asking questions of the genome and publishing the answers they discover today!

The Plant:

Cacao seeds. Photo: EverJean, Flickr (click to see photo in its original context)

Cocao was originally domesticated in South America, but after production there was drastically reduced by a fungus called Moniliophthora perniciosa that causes a disease known as Witch’s Broom, the centers of production shifted to western Africa. Around 40% of the worlds supply is grown in Côte d’Ivoire alone with perhaps another 20% coming from neighboring Ghana. The production process is quite long and complicated, the farms are threatened by disease, and many of the cacao trees are quite old. All of which has contributed to the stories that have recently claimed “THE WORLD IS RUNNING OUT OF CHOCOLATE”

Random plant relationships triva: cacao is in the same family of plants as cotton, kola nuts and okra.

The Rest:

The genome of the cacao tree was sequenced in a collaboration between Mars**, IBM, the USDA and a number of universities. The interest of Mars in the chocolate genome is obvious since an expensive and unpredictable supply of chocolate is a problem for a company that makes its money selling candy. IBM got to show off some of their super-computer hardware in the assembly of all the tiny pieces of sequence data generated using 454 and Illumina sequencing technology into pseudomolecules millions of letters long. In some ways I’m most surprised at the involvement of the US Department of Agriculture since cacao production within the US is effectively zero, but given how much the US spends on buying chocolate from the countries where it is grown, I suppose it makes sense that we’d have an interest in assuring a plentiful and reliable supply.

It’s notable that there was apparently another genome sequencing project being worked on by Pennsylvania State University and CIRAD in France with funding from — you guessed it — Hershey. According to several articles, they may already have a scientific paper being reviewed for publication (something the Mars-IBM-USDA group hasn’t done yet). Publishing first is generally the standard used for who gets credit for a scientific discovery, but in this case, with the Mars data already available to researchers around the world, it may end up not mattering whose paper appears first. Which would really be unfortunate for the researchers involved, but if competing genome projects means data starts getting released to the rest of the scientific community faster I am all for it. Genome projects with normal government funding generally don’t have any direct competition***, which may be why some of them tend to keep a tight hold on their data for what feels like forever.****

If you’d like more details on the cacao genome project, written by professional reporters, not a grad student stealing time between Python programming, seminars, a server migration, and revising the paper we just got back from peer review:

The Washington Post: With DNA of chocolate nearly decoded by scientists, could sweeter treats await?

The New York Times: DNA of Cacao Bean Tree Sequenced by Mars and Hershey (requires a login)

And of course, there’s the Cacao Genome project’s website itself.

Normally the pay off when a company sequences a genome is an advantage over their competition, but Mars has released this dataset to the public, which includes their competitors (although obviously one of their biggest competitors, Hershey, was already going to have its own version of the genome one way or another). Normally the pay-off when public sector scientists sequence a genome comes in the form of publications***** but it’s not clear if that will happen either in this case given the competing genome version. So to everyone involved in the cacao genome project, thank you.

*If you’re downloading the sequence data, these are labeled as super_contigs 1-10.

**Mars apparently pledged to spend $10 million sequencing the chocolate genome over 5 years. Since the genome was released after only two years, I’m not sure how much of that money was actually spent.

***Although that could be changing. For example the JGI Cucumber genome project appears to have been scooped when the China’s BGI who published their version of the genome first. Edit (February 2012: This isn’t actually what happened, I have the chance to talk with someone from JGI who explained their cucumber genome assembly was carried out using donated data, and was never intended for publication.)

****The genome projects out there that constantly miss promised release dates for their data know who they are….

*****And with publications come wealth, fame, and adoration by members of the opposite sex… I wish.


  1. […] dissects the latest genome announcement: cacao. Ignore the press release, just read this. Cancel […]

    Pingback by Nibbles: Yams, Agrobiodiversity, Melons, Cacao — September 16, 2010 @ 12:23 am

  2. there are actually restrictions similar to Ft. Lauderdale, when accessing the data, you are agreeing to their “Reserved Analysis” terms.

    Comment by Haibao Tang — September 16, 2010 @ 9:37 am

  3. Sometime I’d like to talk to those guys working on the cacao thing here at Penn State. I’ve seen a chocolate tree here on campus in the fancy new bio/chem building. I assume it was chocolate anyway, it was cauliflorous & the leaves looked right, and it was beside the entrance to the rooftop fancy greenhouses (what marvels could be growing in there I wondered). They had coffee too I think (wonder how many people walk past their potted trees and think ‘Cool, this is what chocolate & coffee come from…probably not many). The biotech class I’m hoping to take next semester is taught by someone who works on the cacao project I think, might finally get to hear about all that they’re up to.

    Comment by Party Cactus — September 20, 2010 @ 7:54 pm

  4. Give me a call or email
    check out news page on my website
    see you next semester
    and yes that is a cacao tree
    our pet

    Comment by Mark Guiltinan — October 2, 2010 @ 7:38 am

  5. Wow, hello professor! I almost fell off my chair when I saw your name here! My name’s Greg Hoover, and I’m already signed up for 459, even reading the book that was required last year, so far still reading the part with the charts & numbers about the environmental benefits of the HT and IR crops, hoping to get a jump start on the class. Even though Hort students like me don’t need it I wanted to do 460 too, to get more exposure to the subject, but something required for my major came up…oh well, I’ll get it next time around. Can’t wait to finally dive into an actual class on the subject! Such fascinating stuff! This semester’s been so crummy, it’s the light at the end of the tunnel. I look forward to meeting you sometime!

    And I have a small collection of pets myself. Tropical, exotic, and/or otherwise underutilized crops, particularly pomological crops, are my thing. If you like tropical crops, maybe I can tell you about them someday.

    Comment by Party Cactus — October 5, 2010 @ 8:07 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress

%d bloggers like this: