James and the Giant Corn Rotating Header Image

August, 2010:

Apple Genome (part 2)

USDA Public Domain Apples

Edit: If you found this post searching for a link to download the apple genome, check out the more recent entry:

So you want to download the apple genome.

I’ve had a chance to read over the apple genome paper. The genome was sequenced to 16.9 fold coverage* using a mix of Sanger (old school sequencing, if assembling a genome is like putting together a puzzle, Sanger uses the biggest puzzle pieces) and 454 (one of the oldest “2nd generation” technologies, it’s the most expensive per million base pairs sequenced, but gives the longest reads (largest puzzle pieces) of any technology other than Sanger).

The group sequencing the apple genome was able to put together ~600 megabases of sequence spread over the 17 chromosomes of apple The remaining 20% of the genome (the total genome size is estimated at to be 742.3 megabases) is a mixture of smaller assembled pieces that couldn’t be assigned to specific chromosomes, and repetitive sequences (either simple sequences or transposons where there are so many duplicate copies it’s impossible to say which sequence data belongs with which transposon copy).

They were able to identify a whole genome duplication (which they call a genome wide duplication) that occurred in the ancestor of apples 30-65 million years ago. What’s weird is that apple has experienced only one chromosome fusion (starting with an ancestral chromosome set of 9 chromosomes, doubling to 18, and then fusing two chromosomes into one, resulting in the 17 chromosomes seen in apple today) and 3 major rearrangements in all those millions of years. By comparison, my favorite species maize (corn) has reduced it’s chromosome number from 20 to 10 in only 5-12 million years since its most recent whole genome duplication.

I still can’t figure out where to download the genomic sequence data for apple (although I was able to dig up a genome browser here), but now that I’ve had a chance to look over the paper describing the genome, I’m officially adding Apple to our list of sequenced plant genomes.

As always, my thanks and appreciate go out to all the people involved in the sequencing, assembly, annotation, and analysis of this new genome.

If you’re interested, the paper itself can be found at Nature Genetics:

  1. Riccardo Velasco et al., “The genome of the domesticated apple (Malus [times] domestica Borkh.),” Nat Genet advance online publication (online 2010), http://dx.doi.org/10.1038/ng.654.  

Random Apple Triva: (more…)

More of the story behind the “wheat genome sequenced!” articles

Turns out I’m not the only one who was underwhelmed by the data behind the recent stories about the sequencing of the wheat genome. The International Wheat Genome Sequencing Consortium has put out a press release in which they state:

The International Wheat Genome Sequence Consortium, an international consortium of wheat growers, public and private breeders and scientists, strongly disagrees with implications that the sequence reads made available by a UK team, led by Professor Neil Hall, represent in any way the sequence of the wheat genome or that this work is comparable to genome sequences for rice, maize, or soybean. [emphasis mine]

This is starting to look more and more like what happened to the woodland strawberry genome all over again. The press release from the Wheat Genome folks also points out the original press release (from the United Kingdom’s Biotechnology and Biological Sciences Research Council) that accompanied the genome sequence data released last week was very clear about what was being released: (more…)

Apple Genome

Once more, science by press release. Word of the apple (Malus domestica) genome comes first from popular press stories around the web, although there is a mention of a Nature Genetics article, so hopefully more details will be forthcoming.

-The popular press stories do mention a tetraploidy in Apple ~65 million years ago, but that’s been the one useful bit of information I’ve come across so far.

Apples are part of the rosid family, which contains the majority of publicly available plant genomes. That’s both a good and a bad thing. The apple genome won’t provide insight into a whole new branch of the plant family tree, but on the other hand, it’s closely enough related to many other sequenced species to enable more detailed comparative genomics.

If you know where to download a copy of the apple genome, or come across the paper describing it, drop me a line and I’ll add the information to this post. Otherwise I’ll revisit this story in a few days, and see if enough information has become available to let me add apple to our list of sequenced plant genomes.

h/t to @mem_somerville

Tearing down a typical “genome sequenced” news story.

What’s wrong with this Associated Press story on the wheat genome? Everything. And that’s not even getting into the fact they don’t mention that the genome isn’t even assembled yet. But I have to let them off the hook on that one, since none of the coverage seems to be mentioning that fact.


University of Liverpool scientist Neil Hall, whose team cracked the code…

Cracked the code? What code? The genetic code? That’s been cracked for decades. You can look it up on wikipedia. Sequencing a genome isn’t cracking a code, it’s photocopying the instruction manual (admittedly a indescribably vast and confusing instruction manual).

Sequencing an organism’s genome, gives unparalleled insight into how it is formed, develops and dies. [emphasis mine]

How does sequencing a genome give any insight into how a living thing dies? Our genomes don’t have like death genes inserted into them. The whole point of a genome, is that in contains the best methods billions of years of evolution have developed to keep a living thing alive and reproducing as long as possible. This is biology, so there are always exceptions to any rule, but in general genomes are about staying alive and passing on parts of that genome to the next generation, while avoiding death whenever possible.

One reason for the outsize genome is that strains such as the Chinese spring wheat analyzed by Hall’s team carry six copies of the same gene (most creatures carry two.) Another is that wheat has a tangled ancestry, tracing its descent from three different species of wild grass. (more…)

Wheat Genome Draft Sequence

The wheat genome is the Mt. Everest of plant genomics. Remember this chart I used to show how small the peach genome was?

Amount of sequence (DNA) found in a selection of sequenced plant genomes.

Corn is the single biggest value on that graph. Now let’s add on wheat.

Plus wheat

On top of the total size of the wheat genome, further difficulties come from the fact that wheat genome is actually made up of three overlapping and similar genomes (The A, B and D genomes of wheat), making it very difficult to piece together which bits of DNA sequence belong to which genome. As I said back when the brachypodium genome was published:

Wheat stands alone as a genome so complex the very thought of trying to assemble it makes grown bioinformaticians cry (I’m obviously taking some dramatic license here).

Yet I woke up this morning to read the headline “British researchers announce a draft of the complex wheat genome.” I was half-way through a mad dash out my door to race into lab and start the standard awesome genome analyses I get to run whenever a new genome comes out. But then, a frightening thought occurred to me. “What if this is another woodland strawberry genome?*”

Time to dig into the real scientific information behind the flurry of news stories this morning. The website currently serving up the wheat genome is located here. And unlike the press coverage, they’re completely open about the data they’re making available. 5-fold coverage of the wheat genome using 454 sequencing reads. That translates into a ~64 gigabases of DNA (or at 28 gigabyte compressed file), all in short pieces with no idea where the individual pieces go.

Wheat grains. Public domain image from the USDA via wikipedia.

Piecing a genome together is hard. By releasing the raw sequencing reads, rather than an assembled sequence, the groups behind the wheat genome accomplished two big things. They were able to release a “draft genome” much faster and, at the same time, they’ve ensured no one else can do the sort of whole genome analysis that genome groups often seem worried will result in their own results getting scooped. The downside is that they’ve also ensured people like me have no way to assess how useful the sequence will be for our own research.**

Five-fold coverage with 454 reads seems pretty low for a genome as complex and difficult to assemble as the wheat genome, but I can’t judge how much it will impact their ability to assemble a complete genome sequence until I see the assembly itself. The clickthrough page to download the sequencing reads says the research group currently plans to complete their analysis by next April, with publication coming sometime before April 2012. So sometime in the next year and a half, I hope to be back with something real to say about the wheat genome sequence.

UPDATE: Greg over at Pie-ence has coverage of the wheat genome story too.

UPDATE 2: Read how the International Wheat Genome Sequencing Consortium and the Biotechnology and Biological Sciences Research Council in the UK both describe this release of data more accurately.

*For those of you who don’t know about the woodland strawberry genome, at the last PAG meeting, a research gave a talk about the in progress sequencing of the strawberry genome. This was picked up by a reporter as a story announcing the release of a complete strawberry genome sequence, and the story flashed around the web (including to my own site), before it could be corrected.

**Which I’m sure is not a big downside to the people sequencing the genome, but is a big one for me, and enough to make me change my mind about rushing into work early this morning.

Genome size changes and hybrid vigor?

Festuca pallens. Photo Petr Filippov, wikimedia commons. Click to see photo in original context

A group of researchers at Masaryk University in the Czech Republic study how different members of the same grass species (Festuca pallens) have different total amounts of DNA per cell. have a new paper out in New Phytologist  they found that plants with the most unusual genome sizes (really big or really small) are less likely to survive to adulthood than plants that have roughly average amounts of DNA per cell. In other words, plants with abnormally high or low genome sizes have lower fitness than their averagely genome endowed siblings.

One point the authors don’t get into, is whether what they’re really seeing is evidence of heterosis (hybrid vigor). Which is not to say they haven’t thought of it. I can’t think of any way to address the question in a non-model system like Festuca, so any mention of it would have been the sort of pure speculation some peer reviewers don’t care for. But I, writing a totally un-peer reviewed blog, am free to speculate to my heart’s content: (more…)

The (Exaggerated) Fall of Wheat

From a nytimes article on raising prices of wheat in response to the Russian ban on exports:

…wheat, the historic amber-waved grain of the American bread-basket, is out of fashion — a beleaguered has-been crop on many farms, supplanted by the modern cash-cow of farming: corn, used for everything from ethanol fuel to food additives to animal feed.

Fewer acres of wheat were planted nationally last year than any year since 1971, according to federal figures. Kansas, the biggest wheat-producing state, had fewer acres in cultivation this year than any since 1957.

I’ve talked about the troubles of wheat before. In short: it’s very difficult to produce hybrid wheat which keeps yield down, and farmers and companies worried about consumer fears of genetic engineering have kept genetically engineered wheat from being developed. Neither of those is a knock against wheat as a crop or a species, but the end result is there are more and more parts of the country where farmers can make more money planting corn than they could if they grew wheat.

I have some problems with the idea the nytimes seems to be pushing that corn has recently displaced wheat as the crop of choice for American farmers. America has been a nation of corn for a long time. These days corn has more of a PR problem than it used to, but it’s not some crop that came out of nowhere to. The factors I mentioned have tipped the scales slightly further against wheat, but wheat hasn’t been displaced as the first crop of America, simply because it never was to begin with.

With the exception of a couple of years, more farmland has been planted with corn than wheat consistently throughout the last century. And while the amount of corn an acre of land can produce is going up faster than the amount of wheat that can be grown on that same land, corn has consistently produced more food per acre over that entire timespan. Sound like a silly dispute? Absolutely. But it was an excuse to dig into the USDA’s historical data to generate this graph:

Notice that total production of wheat has not dropped along with the total amount of farmland devoted to producing wheat. Instead, increasing yield per acre has allowed farmers to continue to produce the same amount of wheat while planting less total land.

They do things differently in Italy

Giorgio Fidenato has made a habit of carrying a raw ear of yellow corn and taking a hearty bite whenever a camera is in sight.

It’s a provocation. The Italian farmer’s corn is genetically modified, grown surreptitiously in fields in the northeast not far from the Austrian and Slovene borders.

Word spread about the crop, and on Aug. 9 about 70 anti-GMO activists wearing chemical protection suits trampled nearly an acre of corn to the ground.

Read the whole story here.  Makes me want to revive my plan to live-blog eating a papaya genetically engineered to resist the papaya ringspot virus.

The Platypus of Flowering Plants

The elusive and evolutionarily awesome platypus. Photo: Stefan Kraft, wikipedia. Click to see photo in its original context.

Mammals can be divided into two major groups, the placental mammals, and the marsupials. Marsupials (animals like kangaroos, possums, and koalas give birth to their young earlier and then raise them in an external pouch until they’re big enough to handle the real world, while placental mammals (humans, mice, cows and everything in between) don’t give birth until their offspring are more fully developed. The first fossil evidence for both groups of mammals dates back ~120 million years. The two groups account for almost all the mammal species found in the world today. There are, however, 5 exceptions. Four species of spiny ant-eater (or echidna*), and the platypus. The ancestors species have been evolving for just as long as other mammals, but split way from the ancestors of all other mammals long ago. So comparing placental mammals and marsupials to echidnas and platypodes* gives insight into what features of mammals and their genomes evolved early on in the animals that would go on to becomes the ancestors of all living mammals, from the platypus to ourselves, and what characteristics only evolved after our two lines split apart. This was one of the reasons it was so exciting to read about the publication of the platypus genome two years ago.

The flowers of Amborella trichopoda. Photo: scott.zone, flickr. Click to see photo in its original context.

The family tree of flowering plants is in many ways similar to mammals. It is dominated by three large groups of species that split apart ~140 million years ago.The eudicots (like placental mammals, this group is so large and diverse it’s easier to talk about the species that are NOT eudicots than the ones that are), the monocots (species like bananas, pineapples, grasses, and orchids), and magnoliids (source of many exciting spices, like cinnamom, black pepper, and nutmeg, although the magnoliid it’s easiest to picture from the grocery store is the avocado.) And like mammals, there are a few species that don’t fall into any of these giant groups of species, but split off onto their own evolutionary paths early in the development of flowering plants. The most ancient of these in a species called Amborella trichopoda. Amborella grows only in New Caladonia, group of islands east of Australia, smaller than New Jersey with a population the size of Lincoln, Nebraska. And people complain that the platypus is elusive and hard to study!

Why am I telling you about this today? Because in a profile of the Soltises, a couple of famous and prolific plant biologists at the University of Florida who study tetraploidy in plants (although they study much much younger tetraploids than I do***) just published in Science the author mentions, almost in passing:

Next, they plan to sequence Amborella, which should show more clearly whether genome duplication occurred 130 million years ago in the common ancestor to all living angiosperms.

To which I say: Awesome! Can’t wait to read the paper and play with the genome! And… please hurry?

*Not to be confused with echinacea (ie purple cone flower). Even when I try to google mammalian species, my fingers insist I really wanted to type in the name of a plant instead. 😉

**I looked it up and platypodes is one of the accepted purals of platypus.

***Tetraploidy in maize that I study is 5-12 million years old. The tetraploidy the Soltises study in Tragopogon is ~80 years old.

On the perils of automated annotation

One of the common questions for a biologist to ask when given a list of genes is: “What kinds of genes are over represented on this list?” Are these are genes involved in protecting the plant from disease? Genes involved in photosynthesis? Or what?

To answer that question, it’s important to be able to make guesses about the function of all the genes in the genome of an organism. As you can imagine, if half the genes in the whole genome of a plant are involved in photosynthesis (not true), having a list of 20 genes where half are involved in photosynthesis isn’t very interesting. If only 1% of all the genes in the genome are involved in photosynthesis (also not true), that fact that half the genes on your list are involved in photosynthesis becomes a lot more exciting.

So the ability to analyze lists in this fashion requires making guesses about what every gene in the genome is doing. There are ways to do just that using comparisons to genes in other species whose functions are known, and that’s exactly what I did this week with the maize genome. It’s important to keep in mind though that these guesses are just that. Guesses. Exhibit A:

The list of genes I’m working with is highly enriched in genes involved in “neurological system process”. That’s right, my list of plant genes, from corn, has many more genes involved in brain activity that might be expected given the overall representation of those genes in the maize genome. The list also contains unexpectedly high numbers of genes involved specifically in “cognition”.

There’s no substitute for a sanity check from a real live human being*, but unfortunately, it’s hard to scare up enough trained scientists to manually annotate the possible jobs of 30,000+ genes, so until someone gets around to doing that for corn, automated prediction programs such as this one will have to suffice.

*Or maybe a sanity check from highly trained corn plant? Now that we have genetic evidence corn plants are thinking creatures… 😉