James and the Giant Corn Rotating Header Image

August 27th, 2010:

Tearing down a typical “genome sequenced” news story.

What’s wrong with this Associated Press story on the wheat genome? Everything. And that’s not even getting into the fact they don’t mention that the genome isn’t even assembled yet. But I have to let them off the hook on that one, since none of the coverage seems to be mentioning that fact.

Examples:

University of Liverpool scientist Neil Hall, whose team cracked the code…

Cracked the code? What code? The genetic code? That’s been cracked for decades. You can look it up on wikipedia. Sequencing a genome isn’t cracking a code, it’s photocopying the instruction manual (admittedly a indescribably vast and confusing instruction manual).

Sequencing an organism’s genome, gives unparalleled insight into how it is formed, develops and dies. [emphasis mine]

How does sequencing a genome give any insight into how a living thing dies? Our genomes don’t have like death genes inserted into them. The whole point of a genome, is that in contains the best methods billions of years of evolution have developed to keep a living thing alive and reproducing as long as possible. This is biology, so there are always exceptions to any rule, but in general genomes are about staying alive and passing on parts of that genome to the next generation, while avoiding death whenever possible.

One reason for the outsize genome is that strains such as the Chinese spring wheat analyzed by Hall’s team carry six copies of the same gene (most creatures carry two.) Another is that wheat has a tangled ancestry, tracing its descent from three different species of wild grass. (more…)

Wheat Genome Draft Sequence

The wheat genome is the Mt. Everest of plant genomics. Remember this chart I used to show how small the peach genome was?

Amount of sequence (DNA) found in a selection of sequenced plant genomes.

Corn is the single biggest value on that graph. Now let’s add on wheat.

Plus wheat

On top of the total size of the wheat genome, further difficulties come from the fact that wheat genome is actually made up of three overlapping and similar genomes (The A, B and D genomes of wheat), making it very difficult to piece together which bits of DNA sequence belong to which genome. As I said back when the brachypodium genome was published:

Wheat stands alone as a genome so complex the very thought of trying to assemble it makes grown bioinformaticians cry (I’m obviously taking some dramatic license here).

Yet I woke up this morning to read the headline “British researchers announce a draft of the complex wheat genome.” I was half-way through a mad dash out my door to race into lab and start the standard awesome genome analyses I get to run whenever a new genome comes out. But then, a frightening thought occurred to me. “What if this is another woodland strawberry genome?*”

Time to dig into the real scientific information behind the flurry of news stories this morning. The website currently serving up the wheat genome is located here. And unlike the press coverage, they’re completely open about the data they’re making available. 5-fold coverage of the wheat genome using 454 sequencing reads. That translates into a ~64 gigabases of DNA (or at 28 gigabyte compressed file), all in short pieces with no idea where the individual pieces go.

Wheat grains. Public domain image from the USDA via wikipedia.

Piecing a genome together is hard. By releasing the raw sequencing reads, rather than an assembled sequence, the groups behind the wheat genome accomplished two big things. They were able to release a “draft genome” much faster and, at the same time, they’ve ensured no one else can do the sort of whole genome analysis that genome groups often seem worried will result in their own results getting scooped. The downside is that they’ve also ensured people like me have no way to assess how useful the sequence will be for our own research.**

Five-fold coverage with 454 reads seems pretty low for a genome as complex and difficult to assemble as the wheat genome, but I can’t judge how much it will impact their ability to assemble a complete genome sequence until I see the assembly itself. The clickthrough page to download the sequencing reads says the research group currently plans to complete their analysis by next April, with publication coming sometime before April 2012. So sometime in the next year and a half, I hope to be back with something real to say about the wheat genome sequence.

UPDATE: Greg over at Pie-ence has coverage of the wheat genome story too.

UPDATE 2: Read how the International Wheat Genome Sequencing Consortium and the Biotechnology and Biological Sciences Research Council in the UK both describe this release of data more accurately.

*For those of you who don’t know about the woodland strawberry genome, at the last PAG meeting, a research gave a talk about the in progress sequencing of the strawberry genome. This was picked up by a reporter as a story announcing the release of a complete strawberry genome sequence, and the story flashed around the web (including to my own site), before it could be corrected.

**Which I’m sure is not a big downside to the people sequencing the genome, but is a big one for me, and enough to make me change my mind about rushing into work early this morning.