James and the Giant Corn Rotating Header Image

genomics

Hybrid vigor and missing genes

Thinking about defining the number of genes present in the maize genome reminded me of an old* story about the trouble of defining what truly represents a gene and how really awesome ideas can sometimes come years before the data needed to support them.

The year is 2002. The first complete version of the human genome is still a year away. The genomes of two plant species have already been published (rice and arabidopsis) but in terms of shere genome size, both species are a drop in the bucket compared to the human genome, or other plant genomes like corn or wheat. But none of this is particularly important except to set the stage.

Two researchers at Rutgers University were sequencing a tiny piece of the maize genome (~0.01%) that surrounded a single gene call bronze1 — the fifth most studied gene in maize — when they found something unexpected.

They had previously 10 identified genes in a single stretch of 32-kb of the maize genome. (A similar gene density throughout the remainder of the maize genome would have resulted in a maize genome containing more than 700,000 genes!) However it was already known that the maize genome was split between small gene-rich islands and vast desolate expanses of transposons (referred to as transposon nests**), and in fact the same study identified a couple of these nests of transposons on either side of their gene rich island (see part A of the second picture in this post).

Below I'll use cartoons, but here's a real and to scale example of a gene rich island I picked at random from maize chromosome 3. Genes and intergenic spaces are to scale. Base image generated with GenomeViewer, part of the CoGe toolkit. http://www.genomevolution.org/CoGe/

Their initial sequencing used DNA from a breed of corn called McC, which I must admit I’ve only ever read about in this particular paper. However, when they decided to sequenced the same region from the genome of B73*** they made three discoveries which I’ve listed in increasing order of strangeness: (more…)

Share

Transposon Mutagenesis

In yesterday’s Transposon Week post, I discussed how transposons can spread through a species by without providing any benefit to the animals, plants, fungus, or micro-organisms that host them.

Adding a little extra useless DNA doesn’t help an organism survive, but it also doesn’t cause serious harm. But in yesterday’s post I completely avoided one serious question:

When new copies of a transposon get inserted across the genome, what happens to the DNA they land in? For what matter, what kind of DNA do transposons land in in the first place?

The answer to the second question is that different kinds of transposons each have their favorite places to land in the genome. Some transposons like to land in centromeres. Some transposons like to land in other transposons. Some transposons like to land near genes.

Then there are transposons like Mutator. Mutator is a maize/corn transposon that really likes to insert itself into genes. Transposons that usually land in other parts of the genome are also sometimes found in genes.

When a transposon lands in a gene, whether because that’s where it likes to insert or simply by accident, the gene stops working. Depending on which gene has to misfortune to interrupted by a transposon, the effects can range from so-subtle-we-can’t-even-detect-them to so lethal the organism dies before we get a chance to study it. In between are a whole range of effects. From severe developmental mutants, to gorgeous and apparently random streaks of color in flowers, to the spotted corn kernels which were my first introduction to the world of transposons.* (**)

Transposons are always breaking genes. The deadliest mutations disappear from the population as quickly as their appear. More subtle mutations can linger on for generations given rise to all sorts of genetic disorders. And keep in mind many genes can be broken with no visible effect at all. Anything you eat from asparagus to zuccini has the potential to contain genes broken by transposons. And depending on the gene, you’d probably never even know it.

Sorry to put this post up so late (it’s technically already Thursday) and in such a poor shape. I had some craziness in lab today and was waiting (unfortunately without any luck) to hear back about some more interesting stories I could tell about the Mutator transposon.

*To be fair, the last two are actually caused by transposons jumping OUT of genes allowing them to resume their normal function. The original mutations caused by transposons inserting into genes were to break the biochemical pathways used by Dahlia’s to make red pigment in their petals and by corn to produce purple pigment (anthocyanin) in its kernels.

**I really wish there was a good source of freely usable pictures things like transposon sectors in flowers and corn kernels. I can usually find pictures of normal plants on Flickr with creative common licenses. But I really want to be able to show you guys the cool mutants that make genetics so exciting.

Share

Transposons: The Difference Between Junk DNA and Selfish DNA

Tranposons are one of those really cool features of genomes that never really seem to make the jump into the public eye. Most people at least have some conception of what a gene is. It’s a piece of DNA that contains the instructions for making a protein plays some role in the cell. A lot of other people can recall hearing an off-hand statistic only some tiny fraction of the human genome is made up of genes, with the rest being “junk DNA”. The question of why most of our genomes have no apparent function is why there’s a slow trickle of scientific research that gets picked up in the popular press as “scientistists discover junk DNA not junk after all!”.

But the reason most of genetics-genomics people aren’t in a huge rush to discover the hidden function behind most of this “junk DNA” is because we KNOW what most of it does and where it comes from. It’s not junk, it’s selfish DNA. <– although there’s certainly lots of cool stuff remaining to be discovered in the much smaller fractions of genomes we can’t classify at all. (more…)

Share

Welcome to transposon week here at James and the Giant Corn!

I’m just about wrapped up with the big project I’ve been working on recently. Hope to be able to say more about it in the not-too-distant future. Having to be secretive in science sucks.

But there’s a lot of be happy about! I’m done teaching for a long time. As much as I enjoyed working with the kids in my class, the other responsibilities of teaching (grading, sitting through lectures without the chance to break in for the discussions and arguments that make academia so fun, grading, designing assignments, grading) were really starting to wear me down.

And I’m only three weeks (June 22nd) from either passing my qualifying exam or becoming a beaten and broken shell of a man. For three hours four professors will question me on everything I’ve learned (or should have learned but didn’t) in my education up to this point, and everything I propose to spend the next few years of my life doing. This may not sound like a good thing, but it is. Because my qualifying exam has been hanging over my head all semester,

The lab has a new paper in press, having run the sequential gauntlets of Peer Review, Editorial Evaluation, and finally (and perhaps most dreaded) Your-Figures-Aren’t-High-Resolution-Enough e-mails from the journal’s publication department. But more on the details of that whenever the paper actually shows up.

But what was the point of this entry again? Oh yeah. Transposons. I have a soft spot from transposons (I’m guessing most people who work with maize genetics do). Today we may know that transposons are found in practically every genome under the sun, but they were discovered first in maize using old school genetics (breeding plants together and counting traits in the offspring), before DNA sequencing was a gleam in its inventor’s eye.

And on top of that, some delightfully high-copy number transposons are in the middle of proving a major scientific point for me, so I figured the least I could do was devote a week to them here on the site.

If you’re not a geneticist, should you still care about transposons? Absolutely! Transposons are one of the best arguments, not for why genetic engineering is safe, but for why, if anyone worried about hypothetical unintended consequences of genetic engineering should be worried about any food with DNA in it (and as far as I know, that’s all food.) To paraphrase a seinfield character: “No food for you!”

The week’s schedule: (more…)

Share

The Peach Genome Is Out

1.1 pound peach from the Berkeley Farmer's market.

Here. I had no idea anyone was even considering sequencing the peach genome until I heard a single off-hand comment at the maize meeting last month, and all of the sudden here it is. And in better shape in its first release than some genomes are even after they’re published.

This is a pre-publication release, so the Fort Lauderdale Convention is still in effect,* but the peach genome looks really great from the quick and dirty analysis I have already run. They’ve already got the genome assembled into pseudomolecules (chromosomes), unlike some genomes I could mention that have already been published, and marked the locations and structures of genes in the geneome (there was a weird period last summer when there were pre-release versions of the maize genome organized into chromosomes, and pre-release versions with the genes marked, but none that had both.)

*In short, you or I can download the peach genome, play around and study it to our hearts content, but we can’t publish anything on it until the people who actually sequenced the peach genome publish a paper describing their work.

Share

The two genomes of maize

I recently go back from the maize meeting. I mentioned before that big part of the reason to do poster presentations is to get comfortable discussing ones research with people who haven’t specialized in the exact same subject. In my case, my poster got a fair bit of interest which was great. (Although I was surprised which parts people were most interested in.) But there were also a couple of concepts I had a lot of trouble getting across.

It’s too late to do me any good at the maize meeting, but I have created the figure I think I needed to explain those ideas. Too late for the maize meeting, but maybe I can squeeze it into my qualifying exam proposal. Or maybe the next time I get a chance to give a talk on campus. Let’s just not get into how much of my morning I spent putting this together, and pretend it was a good investment of my time ok? (more…)

Share

Missing Genes on a Massive Scale

Edit: stripped out all the numbers as they clearly applied to an earlier version of the data and I don’t know if the new ones are intended for public release yet.

Last november when the maize genome was published, one of the companion papers looked at genes where a different number of copies were found in different breds of maize (this is called Copy Number Variation) and genes found in B73 (the variety of maize that was sequenced) but completely missing from the genomes of other varietes. There’s a great post on that paper written up by Mary at OpenHelix.

A few months later, it sounds like this dataset has grown substantially. Over XXXX B73 genes (that’s X% of the filtered B73 gene set!) that appear to be lost (or have sequences so different they no longer register) in at least some varities of maize. And because the new dataset incorporates data from XX different maize breds and XX different teosinte* lines they’re able to identify some of the losses as older because they’re found in multiple comparisons, while some appear to be lost in only a single breed, and might represent more recent losses.

Sit back and think about that for a second. At least X% of the genes in corn sometimes go missing. This could have implications for everything from inbreeding depressions and hybrid vigor, to the kind of basic research I’m actually working on myself.

As you can imagine I’d love to get my hands on this dataset myself, but the next best thing will be to take furious notes when Nathan Springer talks about the project on Friday morning**, and being sure to swing by Steven Eichten’s poster soak in the awesomeness.

Ruth A. Swanson-Wagner et al. “Combined Analysis of genomic structural variation and gene expression variation between maize and teosinte populations” Talk #1 2010 Maize Meeting (Presented by Nathan Spinger)

Steven R. Eichten et al. “Extenisve Copy Number Variation Among Maize Lines” Poster #139 2010 Maize Meeting

*Teosinte is the wild species from which maize/corn was domesticated.

**And he’s talking at 8:30 AM on a day when I still plan on being heavily jet lagged.

Share

The long genome drought

Grad students graduating with PhDs right now probably entered grad school at point A, or earlier. People at the end of their first post-doc (and potentially in a position to apply for faculty positions and start training new graduate students themselves) would have entered grad school at point B or earlier.

Today there are a mere 10 published plant genomes out of the more than quarter million named plant species in the world. But even ten genomes is a huge amount of data to deal with for a plant genomics community that largely came of age during the long genome drought of 2002-2006. What is the genome drought? The rice genome was published in April 2002. It was only the second plant genome to be sequenced, and the last plant genome to be published until the poplar genome came out in September of 2006, a gap of more than four years. Two genomes, especially of species as distantly related as arabidopsis and rice doesn’t make for a lot of compelling comparative genomics (Although there was certainly some really cool stuff being discovered in this time period.)

Does that matter? Probably not, but it’s important to remember that the people earning their PhDs today probably entered grad school (and chose a lab and field of study) during that two-plant-genome era (see point A) and less opportunity for exciting research mean less grant money and less ability to attract grad students. The youngest people applying for faculty positions today (assuming they only did one, quite successful, post-doc), also entered grad school in the two genome era (see point B), if not the previous single genome era.

I’m talking about this mostly to make the point that I think comparative genomics as a field of study is getting a lot more exciting as more genomes become avaliable, which is likely to attract more graduate students in that key first year when they join a lab and begin to specialize. Which means as we move farther away from the time of the long genome drought, we will hopefully* start to see a lot more well trained people doing plant genomics.

Which is a good thing because the other point this graph should make (not that I think many people need to be reminded of it), is that the pace of sequencing plant genomes is accelerating, and SOMEONE needs to analyze the huge quantities of data that are already starting to flow through the plant biology community.

This should be the last plant genome themed post for a while, but please continue to let me know if you know/hear about more plant genome projects. jcs98 (@) jamesandthegiantcorn.com

*If the hypothetical end to the also-hypothetical-and-possibly-the-result of-wishful-thinking-on-my-part shortage of plant comparative genomicists could hold off long enough for me to be really in demand when/if I finish grad school, that would be great. ;)

Share

And the list gets better!

Based on e-mails and responses to my previous post I’ve made the following additions to the sequenced plant genomes page:

  • Added an entry on Columbine, a member of an early diverging group of eudicots. As far as I can tell this sequence is currently unreleased, but from the JGI website it looks like the initial assembly is already complete, so if you know of a way for people to get ahold of that let me know.
  • Added an entry on the Castor Bean. The sequencing group has released a 4x coverage genome assembly. (The castor bean is the source of the deadly toxin ricin, and is not grown in the US, we import our castor oil from other countries.)
  • Split the entry on Arabidopsis into “Arabidopsis species and allies“. This gives the Arabidopsis lyrata its own heading, and will be important since there are another 7 species from the Arabidopsis genus and its close relatives in the JGI sequencing pipeline.
  • Added an entry on v2 of the date palm genome generated by Weill Cornell Medical College in Qatar. This definitely should still be considered an “in progress” genome, but at least until the banana genome comes out it’s the best non-grass monocot genome available.
  • Added an entry on the genome of Physcomitrella patens, which, as a moss, is the descendant of an evolutionary lineage that split from all the other genomes I’ve listed on the page around 450 million years ago.
  • Added the recently announced sunflower genome project to the list of planned, in-progress, and private genome efforts. (Apparently the genome of the cultivated sunflower is more 3 gigabases. Bigger than corn!) That’ll be a cool genome to see when it comes out.
  • Added information on the woodland strawberry genome project, which aims to have an assembled genome of Fragaria vesca by sometime this year. You may remember the woodland strawberry genome from the mix up back in January.
  • Added the various groups that have announced they have private genome sequences of the oil palm genome to the same section.

Particular thanks to Greg, Jeff, and Eric whose suggestions where behind most of these additions.

Completely unrelated, you may have noticed I switched the RSS feed back to full length entries. I recently tried out Google Reader (I’m way behind the times I know), and it is SO MUCH nicer to see the full entries there than have to click through from a brief summary. The downside, as I know from previous experiences, is that when I send out the full entries by RSS I get a lot less traffic.

I don’t earn any income from traffic to this site, but it is a nice feeling to know people are reading and enjoying something I wrote, and I have no way of tracking how many people (if any) read an article from the RSS feed.

Anyway I’ll keep sending out full entries for at least the next week (I expect I’ll be too busy to worry about ego stroking traffic statistics until at least a week from Tuesday.)

Share

Sequenced Plant Genomes

Libe slope in Ithaca, NY. Behind you are student dorms. At the top of the hill, campus starts. Photo: foreverdigital, flickr (click to see in original context)

When I was an undergraduate, there were exactly two sequenced plant genomes, rice and arabidopsis. And sure maybe I didn’t have to walk “ten miles to school, barefoot, in the snow, uphill, both ways”* the one way I did have to walk uphill (sometimes in the snow but always with shoes), was very uphill. But where was I?

Oh yeah, plant genome sequences. Kids getting into plant genomics these days don’t realize how easy they’ve got it. By my count (which may be low but I’m getting to that) there are ten published plant genomes, with several more unpublished genomes that are available in various states of completion, and lots more on the way.

Which brings me to what I was doing yesterday instead of writing an update for this website: trying to document the published plant genomes, the unpublished genomes that are available, and which new genomes we can expect to see published in the near future.

Please, if you find mistakes or know of additional flowering plant genomes I should mention, let me know! jcs98 (@) jamesandthegiantcorn.com.

If you don’t work in biology, it might be interesting to see which plants have sequenced genomes and how they’re related to each other.

*An explanation of this phrase.

Share