More of the story behind the “wheat genome sequenced!” articles

Turns out I’m not the only one who was underwhelmed by the data behind the recent stories about the sequencing of the wheat genome. The International Wheat Genome Sequencing Consortium has put out a press release in which they state:

The International Wheat Genome Sequence Consortium, an international consortium of wheat growers, public and private breeders and scientists, strongly disagrees with implications that the sequence reads made available by a UK team, led by Professor Neil Hall, represent in any way the sequence of the wheat genome or that this work is comparable to genome sequences for rice, maize, or soybean. [emphasis mine]

This is starting to look more and more like what happened to the woodland strawberry genome all over again. The press release from the Wheat Genome folks also points out the original press release (from the United Kingdom’s Biotechnology and Biological Sciences Research Council) that accompanied the genome sequence data released last week was very clear about what was being released:

The release is a step towards a fully annotated genome and makes a significant contribution to efforts to support global food security and to increase the competitiveness of UK farming. … The genome data released are in a ‘raw’ format, comprising sequence reads of the wheat genome in the form of letters representing the genetic ‘code’. A complete copy of the genome requires further read-throughs, significant work on annotation and the assembly of the data into chromosomes.

Reading the original BBSRC statement, it’s clear they were not claiming to have sequenced the wheat genome and were, in fact, displaying just the sort of openness about unpublished and incomplete data that I’d like to see more of in the genomics community.* Somewhere the BBSRC and the popular press stories that are floating around the web, that original message was lost.

My earlier discussions of the sequence data being called a wheat genome, and the problems with the AP coverage of the story (not even considering on the issue of the genome being a bunch of unassembled reads).

Thanks to Moreno for pointing me to the wheat website. I’d also like to point out @mary_carmichael on twitter, who referenced my previous post and has come up with the shortest accurate way to convey what was wrong with calling the data last week the wheat genome that I’ve read yet:

pieces, not the puzzle

*As an researcher not involved in an genome sequencing project of my own, I depend on the public release of data from various kinds of large scale sequencing projects, from complete genomes to RNA-seq data (data on the level of expression of genes in particular parts of a plant under particular conditions), and while I understand the people generating that data want to make absolutely sure they get the first papers from it, the trend I see today towards holding onto data longer and longer seems very unfortunate, as often one of the main selling points of these large scale projects is that the information generated will enable research into a diverse array of questions by the greater scientific community.

