The wheat genome is the Mt. Everest of plant genomics. Remember this chart I used to show how small the peach genome was?
Corn is the single biggest value on that graph. Now let’s add on wheat.
On top of the total size of the wheat genome, further difficulties come from the fact that wheat genome is actually made up of three overlapping and similar genomes (The A, B and D genomes of wheat), making it very difficult to piece together which bits of DNA sequence belong to which genome. As I said back when the brachypodium genome was published:
Wheat stands alone as a genome so complex the very thought of trying to assemble it makes grown bioinformaticians cry (I’m obviously taking some dramatic license here).
Yet I woke up this morning to read the headline “British researchers announce a draft of the complex wheat genome.” I was half-way through a mad dash out my door to race into lab and start the standard awesome genome analyses I get to run whenever a new genome comes out. But then, a frightening thought occurred to me. “What if this is another woodland strawberry genome?*”
Time to dig into the real scientific information behind the flurry of news stories this morning. The website currently serving up the wheat genome is located here. And unlike the press coverage, they’re completely open about the data they’re making available. 5-fold coverage of the wheat genome using 454 sequencing reads. That translates into a ~64 gigabases of DNA (or at 28 gigabyte compressed file), all in short pieces with no idea where the individual pieces go.
Piecing a genome together is hard. By releasing the raw sequencing reads, rather than an assembled sequence, the groups behind the wheat genome accomplished two big things. They were able to release a “draft genome” much faster and, at the same time, they’ve ensured no one else can do the sort of whole genome analysis that genome groups often seem worried will result in their own results getting scooped. The downside is that they’ve also ensured people like me have no way to assess how useful the sequence will be for our own research.**
Five-fold coverage with 454 reads seems pretty low for a genome as complex and difficult to assemble as the wheat genome, but I can’t judge how much it will impact their ability to assemble a complete genome sequence until I see the assembly itself. The clickthrough page to download the sequencing reads says the research group currently plans to complete their analysis by next April, with publication coming sometime before April 2012. So sometime in the next year and a half, I hope to be back with something real to say about the wheat genome sequence.
UPDATE: Greg over at Pie-ence has coverage of the wheat genome story too.
UPDATE 2: Read how the International Wheat Genome Sequencing Consortium and the Biotechnology and Biological Sciences Research Council in the UK both describe this release of data more accurately.
*For those of you who don’t know about the woodland strawberry genome, at the last PAG meeting, a research gave a talk about the in progress sequencing of the strawberry genome. This was picked up by a reporter as a story announcing the release of a complete strawberry genome sequence, and the story flashed around the web (including to my own site), before it could be corrected.
**Which I’m sure is not a big downside to the people sequencing the genome, but is a big one for me, and enough to make me change my mind about rushing into work early this morning.