James and the Giant Corn Rotating Header Image

March, 2010:

Missing Genes on a Massive Scale

Edit: stripped out all the numbers as they clearly applied to an earlier version of the data and I don’t know if the new ones are intended for public release yet.

Last november when the maize genome was published, one of the companion papers looked at genes where a different number of copies were found in different breds of maize (this is called Copy Number Variation) and genes found in B73 (the variety of maize that was sequenced) but completely missing from the genomes of other varietes. There’s a great post on that paper written up by Mary at OpenHelix.

A few months later, it sounds like this dataset has grown substantially. Over XXXX B73 genes (that’s X% of the filtered B73 gene set!) that appear to be lost (or have sequences so different they no longer register) in at least some varities of maize. And because the new dataset incorporates data from XX different maize breds and XX different teosinte* lines they’re able to identify some of the losses as older because they’re found in multiple comparisons, while some appear to be lost in only a single breed, and might represent more recent losses.

Sit back and think about that for a second. At least X% of the genes in corn sometimes go missing. This could have implications for everything from inbreeding depressions and hybrid vigor, to the kind of basic research I’m actually working on myself.

As you can imagine I’d love to get my hands on this dataset myself, but the next best thing will be to take furious notes when Nathan Springer talks about the project on Friday morning**, and being sure to swing by Steven Eichten’s poster soak in the awesomeness.

Ruth A. Swanson-Wagner et al. “Combined Analysis of genomic structural variation and gene expression variation between maize and teosinte populations” Talk #1 2010 Maize Meeting (Presented by Nathan Spinger)

Steven R. Eichten et al. “Extenisve Copy Number Variation Among Maize Lines” Poster #139 2010 Maize Meeting

*Teosinte is the wild species from which maize/corn was domesticated.

**And he’s talking at 8:30 AM on a day when I still plan on being heavily jet lagged.

Abnormal Chromosome 10

There is a piece of DNA that is sometimes found on the end of the tenth maize chromosome. In plants that possess this extra chromosome segment, chromosome knobs* (including one that’s a part of the extra segment included in abnormal chromosome 10) start to act like centromeres**. But this story graduates from odd to downright weird when I tell you that possessing this extra centromere-like activity gives a chromosome an unfair advantage in being passed on to the next generation.

Plants, like animals, possess two complete genome copies, one from each parent. They’ll only pass on one copy (mixtures of pieces from each parent) to their offspring. Any given sequence has a 50% chance of being passed on which seems fair given the plant is passing on 50% of its total genetic material. But abnormal chromosome ten cheats (using those extra centromere-like sequences I mentioned earlier). It has up to an 83% chance of being passed on.

Since the breed of corn (B73) the maize genome was based on has the normal version of chromosome 10, we know very little about the extra DNA found in abnormal chromosome 10. The authors of this poster are going to correct that oversight, by sequencing the region, figuring out how (and how long ago) abnormal chromosome 10 came into being, and hopefully identifying the genes within the region that make chromosome-knobs act like centromeres.

*Knobs are dense segments of DNA that scientists have been able to spot visually within chromosomes since before we knew for sure that chromosomes carried genetic information.

**Centromeres are the part of the chromosomes that bind together during cell division (the center of the X in the traditional drawing of a chromosome). They’re also the place where the molecular machinery that pulls chromosomes apart at the end of the process of cell division.

Lisa Kanizay and Kelly R. Dawe “Uncovering the sequence and structure of maize abnormal chromosome 10” Poster #165 2010 Maize Meeting.

Leaving for the maize meeting

The maize meeting is a once a year chance for people nearly as into studying corn as I am to gather in one place, talk science, drink beer, and talk science.

I’ve got a few pre-scheduled updates (fewer than I was planning) based on interesting posters and talks I want to check out at the meeting, but aside from that you probably won’t be hearing much from me for the rest of the week.

Hopefully I’ll see at least a few of you there!

The long genome drought

Grad students graduating with PhDs right now probably entered grad school at point A, or earlier. People at the end of their first post-doc (and potentially in a position to apply for faculty positions and start training new graduate students themselves) would have entered grad school at point B or earlier.

Today there are a mere 10 published plant genomes out of the more than quarter million named plant species in the world. But even ten genomes is a huge amount of data to deal with for a plant genomics community that largely came of age during the long genome drought of 2002-2006. What is the genome drought? The rice genome was published in April 2002. It was only the second plant genome to be sequenced, and the last plant genome to be published until the poplar genome came out in September of 2006, a gap of more than four years. Two genomes, especially of species as distantly related as arabidopsis and rice doesn’t make for a lot of compelling comparative genomics (Although there was certainly some really cool stuff being discovered in this time period.)

Does that matter? Probably not, but it’s important to remember that the people earning their PhDs today probably entered grad school (and chose a lab and field of study) during that two-plant-genome era (see point A) and less opportunity for exciting research mean less grant money and less ability to attract grad students. The youngest people applying for faculty positions today (assuming they only did one, quite successful, post-doc), also entered grad school in the two genome era (see point B), if not the previous single genome era.

I’m talking about this mostly to make the point that I think comparative genomics as a field of study is getting a lot more exciting as more genomes become avaliable, which is likely to attract more graduate students in that key first year when they join a lab and begin to specialize. Which means as we move farther away from the time of the long genome drought, we will hopefully* start to see a lot more well trained people doing plant genomics.

Which is a good thing because the other point this graph should make (not that I think many people need to be reminded of it), is that the pace of sequencing plant genomes is accelerating, and SOMEONE needs to analyze the huge quantities of data that are already starting to flow through the plant biology community.

This should be the last plant genome themed post for a while, but please continue to let me know if you know/hear about more plant genome projects. jcs98 (@) jamesandthegiantcorn.com

*If the hypothetical end to the also-hypothetical-and-possibly-the-result of-wishful-thinking-on-my-part shortage of plant comparative genomicists could hold off long enough for me to be really in demand when/if I finish grad school, that would be great. 😉

And the list gets better!

Based on e-mails and responses to my previous post I’ve made the following additions to the sequenced plant genomes page:

  • Added an entry on Columbine, a member of an early diverging group of eudicots. As far as I can tell this sequence is currently unreleased, but from the JGI website it looks like the initial assembly is already complete, so if you know of a way for people to get ahold of that let me know.
  • Added an entry on the Castor Bean. The sequencing group has released a 4x coverage genome assembly. (The castor bean is the source of the deadly toxin ricin, and is not grown in the US, we import our castor oil from other countries.)
  • Split the entry on Arabidopsis into “Arabidopsis species and allies“. This gives the Arabidopsis lyrata its own heading, and will be important since there are another 7 species from the Arabidopsis genus and its close relatives in the JGI sequencing pipeline.
  • Added an entry on v2 of the date palm genome generated by Weill Cornell Medical College in Qatar. This definitely should still be considered an “in progress” genome, but at least until the banana genome comes out it’s the best non-grass monocot genome available.
  • Added an entry on the genome of Physcomitrella patens, which, as a moss, is the descendant of an evolutionary lineage that split from all the other genomes I’ve listed on the page around 450 million years ago.
  • Added the recently announced sunflower genome project to the list of planned, in-progress, and private genome efforts. (Apparently the genome of the cultivated sunflower is more 3 gigabases. Bigger than corn!) That’ll be a cool genome to see when it comes out.
  • Added information on the woodland strawberry genome project, which aims to have an assembled genome of Fragaria vesca by sometime this year. You may remember the woodland strawberry genome from the mix up back in January.
  • Added the various groups that have announced they have private genome sequences of the oil palm genome to the same section.

Particular thanks to Greg, Jeff, and Eric whose suggestions where behind most of these additions.

Completely unrelated, you may have noticed I switched the RSS feed back to full length entries. I recently tried out Google Reader (I’m way behind the times I know), and it is SO MUCH nicer to see the full entries there than have to click through from a brief summary. The downside, as I know from previous experiences, is that when I send out the full entries by RSS I get a lot less traffic.

I don’t earn any income from traffic to this site, but it is a nice feeling to know people are reading and enjoying something I wrote, and I have no way of tracking how many people (if any) read an article from the RSS feed.

Anyway I’ll keep sending out full entries for at least the next week (I expect I’ll be too busy to worry about ego stroking traffic statistics until at least a week from Tuesday.)

Sequenced Plant Genomes

Libe slope in Ithaca, NY. Behind you are student dorms. At the top of the hill, campus starts. Photo: foreverdigital, flickr (click to see in original context)

When I was an undergraduate, there were exactly two sequenced plant genomes, rice and arabidopsis. And sure maybe I didn’t have to walk “ten miles to school, barefoot, in the snow, uphill, both ways”* the one way I did have to walk uphill (sometimes in the snow but always with shoes), was very uphill. But where was I?

Oh yeah, plant genome sequences. Kids getting into plant genomics these days don’t realize how easy they’ve got it. By my count (which may be low but I’m getting to that) there are ten published plant genomes, with several more unpublished genomes that are available in various states of completion, and lots more on the way.

Which brings me to what I was doing yesterday instead of writing an update for this website: trying to document the published plant genomes, the unpublished genomes that are available, and which new genomes we can expect to see published in the near future.

Please, if you find mistakes or know of additional flowering plant genomes I should mention, let me know! jcs98 (@) jamesandthegiantcorn.com.

If you don’t work in biology, it might be interesting to see which plants have sequenced genomes and how they’re related to each other.

*An explanation of this phrase.

Chromosomes and Ploidy at PATSP

Mr Subjunctive who writes a really fun site for plant lovers and amateur to professional horticulturalists (who I’m pretty sure are also plant lovers or they would have gone into another field), set out to write a post on Phalaenopsis (a kind of orchid*).

I started this off with really good intentions, but quickly wound up on weird tangents, and then some of the tangents had tangents, and then at some point I looked up and saw that I’d written 2000 words without ever getting to how you’re supposed to take care of them. So if you’re here to find out how to actually grow Phalaenopsis, you’ll want to skip on ahead to Part II (which will post next Wednesday). Otherwise, read on.

Briefly, he covers chromosomes, the sex chromosomes of humans, weird changes in chromosomes numbers from changes in ploidy (have one or more extra copies of each chromosome) and aneuploidy (having more or less copies of individual chromosomes than normal) and some of the advantages and disadvantages of working with such plants. All while staying easily readable (something I can never seem to do once I get into talking about science).

Anyway, the point of this entry can be summed up in two words:

Go. Read.

And if you like it, check out his List: House Plants You’ll Be Growing During the Zombie Apocalypse of 2014

*Orchids are monocots (the group of species that grasses also belong to, along with pineapples, bananas and all sorts of other cool plants), so I start out biased in their favor.

Yield of Arabidopsis

Arabidopsis thaliana.

Arabidopsis thaliana is a plant which though its small size, possession of the first sequenced plant genome (released in 2000 four years before the complete human genome), and short generation time* has become a common sight in plant biology labs around the world. From an applied standpoint, the main problem with Arabidopsis research is that, like any model organizism, sometimes biologists discover things that are broadly applicable to the plant world (including important crop species, from apricots to zuccinnis) and sometimes their discoveries turn out to be specific to arabidopsis and some of its close relatives.

Since Arabidopsis is probably now the most studied plant on the planet, could we cut out the middle-man and simply grow fields of arabidopsis? You’re about to become one of the few people who knows the answer to that question. (more…)

Food Nostalgia

James McWilliams, writing at the nytimes, makes the point that the idealizing the diets of generations past has been going on for at least 150 years. Michael Pollan’s rule: “Don’t eat anything your great-grandmother wouldn’t recognize as food” loses some of its effectiveness when you picture all four* of your great grandmothers (who were probably alive in the era of the world wars) idealizing the diets of diet of civil war era american, and so on.

h/t to Greed, Green, and Grains.

GG&G is well worth checking out in its own right. On this particular subject Michael Roberts, the blog’s author, makes the point that just because nostalgia for the foods of the past isn’t a new development, doesn’t mean there aren’t real problems with the food we eat today.

There are real problems with the ways we produce and consume food in this country. (And a whole separate set of problems in other parts of the world.) But by over-idealizing the food our great grandparents ate we’re looking for the answers to today’s problems in the past, when the real answers to the problems we face can be found (you guessed it) in the future.

Which is not to say we can’t learn from the mistakes and successes of the past. I just don’t think it’d be a good trade to exchange my diet for that of my great grandmother (any of the four).

*Four great-grandmothers (eight great-grandparents!), but consider that if we go back ten generations (perhaps an average of 250 years) we’re each descended from over 1000 people. (1024 assuming your family tree was completely free of inter-marriage).

Wow!

Who could have predicted maize geneticists would be so interested in maize genes? The entry I posted last night on Purple plant1 and Colored aleurone1 easily received more traffic in its first day on the site (it’s still got a long way to go before it catches long term readership attractors like water chestnuts and the NIPGR tomatoes), than any entry since the heady days of the maize genome release back in November.

The relationships of the four grass species with sequenced genomes. The branches are NOT to scale with how long ago the species split apart. Green stars represent whole genome duplications. The most important one to notice in the one in the ancestry of maize/corn. That duplication means that every region in sorghum, rice, or brachypodium is equivalent to two different places in the maize genome, one descended from each of the two copies of the genome that existed after the duplication.

And this morning the dataset I drew that example from, 464 classical maize genes mapped onto the maize genome assembly plus syntenic orthologs in up to four grass species: sorghum, rice, brachypodium, and the other region of the maize genome created by the maize whole genome duplication (technically syntenic homeologs since we started in maize to begin with, by the principle is the same), went out to the maize genetics community (thank you MaizeGDB!).

A postdoc in our lab tells me more people have visited CoGe today than any day on record (and we hit that mark before noon!).

Anyway, thank you guys, it’s great to feel appreciated!