James and the Giant Corn Rotating Header Image

qTeller part 2: Eye candy!

qTeller isn’t just for generating spreadsheets full of data on genes within an genomic region. It can also visualize published expression data for a single gene. For example here is the expression pattern of a gene called golden plant2 involved in regulating photosynthetic development in maize which was first  described all the way back in 1926 in an article in the american naturalist:

As you can see golden plant2 is expressed at high levels in photosynthetic tissues and not expressed at all in tissues like roots, endosperm, and pollen. Do you know how long it would have taken me to profile the plant-wide expression pattern of a gene this comprehensively by isolating RNA from different tissues using qPCR? WEEKS! Do you know how long it took for me to get the same level of insight with qTeller? 90 seconds!

Do you know how long it’ll take you to regenerate this same analysis? 30 seconds. Just click this link. There have been so many awesome RNA-seq papers coming out recently for maize. I know when I arrive in Portland on Thursday for the Maize Genetics Conference I’m going to see a whole lot more even bigger/better RNA-seq datasets which people haven’t finished writing up yet. Some of these datasets have been on posters since the very first maize meeting I went to back in 2009 when I was a wide-eyed first year and may _never_ get published.* But others will be published weeks or months from now, making this visualization all the more powerful.

But for now, MORE EYE CANDY:

Anther ear1

Expression of anther ear1, a mutant in the gibberellic acid biosynthetic pathway

Link to regenerate analysis


Link to regenerate analysis. 

Glossy1 mutants change the type of wax produced on the leaves of developing maize seedlings, so it makes sense that the gene shows high expression in both maize seedlings and mature leaves. I can even sort of explain away the high expression in developing seeds and embryos since the the primordia which will eventually become the first leaves of the next generation of corn plants are beginning to form, But why in the world does glossy1 show such high levels of expression in anthers?

*Here is the relevant excerpt from my previous rant on data analysis:

I recently did the math on a PLoS Genetics paper published in late 2009 based on on a single in-depth analysis of RNA-seq comparisons of mutant and non-mutant siblings. Today we could generate the same dataset, with twice the depth of sequencing for less than $1000 dollars. (INCLUDING regent costs). The takeaway lesson here: just because your dataset was expensive to generate doesn’t mean you don’t have to worry about the competition stealing the glory if you take more than a year to publish. 

qTeller: an easier way to find candidate genes

Hunting for good candidate genes is something biologists spend a lot of their time doing. Here are a couple of hypothetical examples:

A) Suzzy the grad student is mapping a recessive mutant which makes the pollen of cornplants shrivel up and die. By examining a bunch of known genetic markers in plants with dead pollen and normal pollen producing siblings of those plants she has narrowed the location of the gene responsible for her trait down to a region of only a couple of megabases on the fifth chromosome of maize. Since the whole maize genome contains over 2,300 megabases of sequence that means she’s already ruled out 99.9% of the genome. But her region still contains, say, a dozen genes and she needs to know which one she should check first to see if mutation in it is responsible for her mutant phenotype.

B) Johnny is another grad student. He wants to understand how corn plants genetically regulate how wide their leaves will grow to be. By measuring a lot of plants descended from two parents, each with known genotypes, he can identify regions of the genome where inheriting information from one parent or the other seems to be correlated with either wider or narrower leaves. He calls these regions quantitative trait loci (or QTLs). Now he has picked the genetic region that seems to have the biggest effect, and he wants to know what gene within the region is actually responsible for the effect.

There are a number of ways for both Johnny and Suzzy to narrow down their lists to the genes most likely responsible for the changes they are each observing in corn plants: (more…)

Open Access and Ecologists

Jonathan Eisen has a new post up on his blog “The Tree of Life” tearing apart a letter from the Ecological Society of America (link is to a pdf). In the letter, the ESA comes out against the government requiring that papers funded by federal research dollars must be made available for free online after a set amount of time has passed since publication. Jonathan Eisen, one of the strongest voices in the open access movement, naturally finds this position disturbing. But why would a scientific society come out against the open dissemination of information? Here is my — entirely speculative — explanation. (more…)

Bad Blood On Pigeonpea

First of all, apparently I’ve been spelling Pigeonpea wrong. The random “d” I was inserting comes from “pidgin.” But fortunately that’s not what the scientific feud is about.

Instead, it turns out that there were actually two independent assemblies of the pigeonpea genome published in separate journals within a couple of weeks of each other.

One, which I’ve been talking about, was published in Nature Biotechnology (impact factor 31) on November 6th. This pigeonpea project was run by ICRISAT (one of the CGIAR centers) with much of the actual sequencing and informatics contracted out to BGI (the Beijing Genomics Institute).

But what I didn’t know was that a second pigeonpea assembly was published back on October 25th in the “JOURNAL OF PLANT BIOCHEMISTRY AND BIOTECHNOLOGY” a journal far fewer scientists will recognize the title of (impact factor 0.41). This genome was put together by a group at the Indian Council of Agricultural Research (ICAR).

Confused by all the acronyms yet?

Alright, I’ll drop all of them for the remainder of this post:

One group:

  • Has a longer history of working on pigeonpea genomics
  • Published first (if just barely)

The other:

  • Published in a much higher impact journal
  • Has more total assembled sequence*

Seems simple enough. And just the sort of thing that is bound to happen more and more as the cost of genome sequencing continues to drop. Unfortunately there seems to be a lot of bad blood between the research groups and all sorts of stuff (like who is more of a real indian) is getting dragged in. Check out the comments section on this article.

I really only have two thoughts on the subject:

  1. The research community really will benefit if these two research groups can make peace with each other and merge their data into a combined assembly for version 2 of the pigeonpea genome. Neither assembly is all that great yet, and the two genomes were sequenced using different technologies: 454 sequencing (fewer longer reads) and Illumina sequencing (more shorter reads). A merged assembly could capture more of the total genome and — more importantly in my opinion — maybe get a larger fraction of the current sequence placed and orientated within the pseudomolecules. Right now less than 250 megabases of the 600-odd megabase genome are placed within chromosomes. The other 350 megabases are present as over 100,000 floating contigs, many as small as 103 bp of sequence.
  2. These situations are terrible for everyone directly involved, all of whom are probably afraid they will not get the credit they expected for the incredibly hard work of sequencing a genome. But the same mess is good for both the broader scientific community and the world as a whole. Incidents like the current one with pigeonpea or the competing chocolate genome projects and the three cucumber genomes drive home one message. Genome projects can be scooped. You can’t afford to sit on your data for months or years… you need to write up your findings and make your data available fast. As a result science moves faster (good for us scientists trying to publish) and in the long term important discoveries with the potential to help people are made faster (good for the whole world).

*In the article linked above the author states: “The major difference between the two sequencing projects is that while the ICRISAT-led team has assembled 605.78 Mb out of the 833.07 Mb (about 72.5%) of the genome, the ICAR team has captured 511 Mb (about 61%).” I haven’t had a chance to look at the ICAR assembly yet, but to get to 605 Mb the ICRISAT assembly is including contigs as small as 103 base pairs, which is a uselessly small piece of DNA from a genomics or genetics perspective. So comparing raw “sequence assembled” numbers without asking “how many pieces? (and how big were they?)” clearly will not give the whole picture.

Look! It’s another reason to dispair ;-)

Not that grad school isn’t loaded with plenty to begin with, but here’s one I hadn’t considered before:

Yes, for many people, a college town is a rather idyllic place. There is a specific subpopulation in these college towns, however, for whom the experience becomes utterly hopeless. This subpopulation: those who move to college towns, are not college-aged, and arrive without a significant other. Meet those requirements, and you’re basically hosed until you escape. It is the bog of eternal singlehood.

Well, at least as a consolation, you will find great friends, for whom your sad, lonely, single self will serve as a reminder of why they need to stay committed to their own relationships.

Yikes! I would say another big part of the puzzle is that grad student lifestyle (regardless of where you live) isn’t friendly to attempts to get out and meet new people. Take the fact that I’m sitting in the office updating my blog while waiting for one last genome to finish loading into CoGe (pigeonpea!) at seven-thirty on a Friday night.*

And that means even if you did happen to come into grad school with a significant other, things can get messy (as explained by the Genomic Repairman):

So when work builds up, I tend to act less human and more like a robot and just grind away. And unfortunately I take on a sort of tunnel vision when I’m grinding. … If its not directly related to whats happening now it gets place on the backburner, which is fine if its mundane paperwork or BS emails that need to be sent out. Its not good if its your relationship and its going to cause tension.

I think its hard for a significant other who doesn’t do science to appreciate what we do. We can’t check out of work at 5pm and not worry about it to the other day. The stakes are too high in the game we play and you must be invested in your work. I am. I wake up at night with ideas, fears, and concerns. Did I do the transfection right? Am I being scooped?

Emphasis mine.

*Just to be clear no one MADE me work late on the friday after thanksgiving. It is just really easy to get engrossed in science — at least when your research is going well — lose track of time, and ignore the rest of your life.

Using Undergrads to Improve Science On Wikipedia

I came across a fascinating article on twitter while I was procrastinating this morning. It describes a class at Davidson College where, rather than having to write research papers for final projects, students instead wrote or updated wikipedia articles on topics within the field of the class (psychology in this case). Now there are many great things about wikipedia, but many of the entries on modern scientific subjects (stuff that you wouldn’t be able to find in high school/101 level textbooks) are woefully out of date and/or badly written.

Example: here is wikipedia’s list of published plant genome sequences. It lists 16 published plant genomes. But there are actually 25 published plant genome sequences at the moment.*

Despite that, when I can’t figure out how to track down a piece of information using literature searches, wikipedia is usually one of my first fallback solutions to at least get a broad overview of some subject I know very little about. The idea of having undergrads write up the information they’ve been learning in class to make it available to the broader public really grabbed my imagination. It sounds like students really like it too:

Students have been excited from the very first day I described the project. Many, many students have told me they particularly appreciate that their work will be read by more than “just the professor.”

I’m just a lowly grad student and can’t motivate large numbers of undergrads with the threat of having to write a traditional research paper, a necessary prerequisite for starting a project like this. But it’d be a lot of fun to help out with such a project in plant biology… ::mentally runs through the list of professors he might be able to pitch the idea to::

Yeah, I’ve got nothing. But I do know this is the first time I’ve actually been sad that grad students in my department don’t get the chance to teach independent courses.

*The missing ones are: thellungiella, Brassica rapa, papaya, castor bean, cannabis, strawberry, pigeon pea, lotus, and medicago

Yes you’re exceptional, but so is everyone else at this level

Another perspective on why people continue to line up for grad school despite the extremely long odds against success:

Graduate students are, almost by definition, atypical students as undergraduates. In most cases, the types of people who enroll in graduate work were exceptionally bright, hardworking undergraduate students. As exceptional undergraduates, the people who eventually go on to graduate studies probably get very good at disregarding warnings. When, as an undergraduate, an instructor issued routine warnings to the class, the grad-school-bound student might have gotten very used to ignoring the sorts of admonitions that pervade the undergraduate experience: “My bibliographies are always perfect, and I turn everything in on time, so this warning to make sure that my APA formatting is correct and to have my paper turned in by Monday is nothing to worry about.”

By the time they arrive at graduate school, and even if they are years removed from their undergraduate education, most grad students have been conditioned to see themselves as an exception, and as exceptional. So, when they begin to hear warnings about the realities of the job market in graduate school, the old conditioning kicks in, and the old thinking, so trustworthy before, also kicks in: “This doesn’t apply to me. My intelligence and hard work will see me through, just as they always have.” The problem, of course, is that not everyone can actually be the exception. People will be disappointed, their studies abandoned, their dreams unfulfilled, their future paths unclear.

What is so impossible for many graduate students to understand is that everybody in their cohort is just as smart and hardworking as they themselves are. At the graduate level, the smarts and diligence that once set students apart from their undergraduate peers will no longer set them apart, but merely allow them to keep up. It is almost impossible for many beginning graduate students to grasp that having above average intelligence and an unimpeachable work ethic will mean only that they are average graduate students. That’s quite a shock to some people.

The whole article is definitely worth a read. I can only speak to myself — and I’ve never been good about getting papers handed in on time — but aside from that I could definitely relate to the mindset described here.

Dropout Rates in Academia (In Perspective)

A few weeks ago I was reading an article which claimed before the recession seven times as many PhDs were awarded in the biological sciences as there were openings in tenure track positions. Of course in between finishing grad school comes years of post-doc work, but in the end PhDs in must equal PhDs out.

So assuming every PhD graduate wants to be a professor (probably not true) that means even after making it past admissions committees and qualifying exams and thesis defenses, these newly minted PhDs face an 86% washout rate in their quest for a faculty position.

Eighty-seven percent. Let’s put that in context. These are the numbers I turned up with some quick googling:

  • Roughly 10% of marine recruits drop out during basic training
  • Roughly 55% of people going through the training to become army rangers drop out
  • In an average year 70% of the people who start training to be Navy Seals (the folks they sent in when they finally found Osama bin Laden) don’t make it to the end.
  • To actually find a training regime with a higher dropout rate than the road from PhD to Professor I had to go to the wikipedia page of the Pararescue Jumpers — the guys who jump out of the rescue helicopters into enemy territory to rescue the wounded. Their washout rate in 90%.
Now there are all sorts of reasons these numbers aren’t comparable. I think they do a good job of driving home just how long the odds against success are in academia. And this is all based on numbers from before the recessions.
So that’s why I’m lying awake after midnight tonight. How about you?

Greg Has Moved

It’s hard work keeping up with a blog while being a grad student, but some people find the time to manage it.

Here’s a new site: ProSeed with Science written by Greg, the same guy who used to write Pie-ence. First few posts look interesting (lots of sunflower stuff, a crop with a genome even bigger than that of maize!), and he’s already made it past the three-post line before which most new blogs die a quick death of neglect. Check it out!

What is a QTL?

QTL stands for quantitative trait locus. Which raises even more questions. What is a quantitative trait? What is a locus?.

  • A locus is simply a region within a genome. Anything from a part of a single gene to a large hunk of a chromosome.
  • A quantitative trait is one where different individuals vary continuously (like height or weight) rather than falling into discrete categories (like whether a person has blue or brown eyes*).

A QTL is simply a part of the genome that has been show (using complicated statistical tests) to influence a quantitative trait like height. For example, people with one particular region of chromosome 8 tend to be slightly thinner than people with other versions.** Now a lot of qualities we’re very interested in as a society turn out to be quantitative traits. I’m not even going to touch the implications for human genetics, but within plant biology lots of the things people are really interested in changes, from flowering time, to drought or disease resistance, to the big kahuna of them all YIELD, are quantitative traits.

How are new QTLs discovered? It’s not as simple as classical genetics where you can simply run a mutant screen, pull out individual that look weird in a way that seems interesting, and identify the gene which was mutated to create the change you observed.*** Instead a researcher has to measure their specific trait in a bunch of individuals (easily done for something like height, less easily done for something like number of root hairs per centimeter of root or trichomes per leaf****) and then compare those measurements to a bunch of information about the genomes of each of those individuals. If the average height of all the individuals with version A of some part of the genome is higher than the average height for all individuals which have version B of that same part of the genome and that difference is significant after a whole bunch of statistical tests, then that region is a QTL.

Do all that and congratulations. You’re done now. You can go publish a paper describing your discovery of QTL controlling whatever trait you just measured! Depending on the species, the trait, and how many (and how small) the QTL you found, that paper could be anywhere from a major finding to something buried in a never-heard-of-the-name-before journal. QTLs are one of those weird case (like cell phones) where smaller is better.

Why? Because the logical next step, after identifying a QTL, is to figure out what it is about that region which influences the measured trait. If the QTL in question is too large, that could mean trying to take a list of dozens or hundreds of genes and, somehow, devising a test to prove: It’s this one! Gene AT1G15210 helps regulate height (or root hairs, or trichomes, or whatever it is being studied).

If you’d like to check out an example of what an actual QTL paper looks like, I really enjoyed a recent one in G3 (G3 is open access, so everyone should be able to access this) at measured the development of tassel like outgroups on the end of maize ears. I’ve run across this a few times in the field back when I did actual maize genetics and always wondered what was going on genetically to create such weird looking plants. I still don’t know for sure, but now I know there are real geneticists out there working to discover the answer:

Holland JB, Coles ND. 2011. QTL Controlling Masculinization of Ear Tips in a Maize (Zea mays L.) Intraspecific Cross. G3: Genes, Genomes, Genetics 1: 337 -341.
My only complaint is I wish they’d included a figure showing the actual phenomenon of masculinized ear tips so all the non-maize people could see how cool it looks.

* Yes, if you’ve spent any length of time staring into a number of women’s eyes (or men’s for that matter) you’ll know there’s a great deal of variation within those categories, but the point is there ARE obvious categories for eye color, while any attempt to group people by weight or height would rely on essentially arbitrary cut offs.

**This statement is used as example. I know absolutely NOTHING about human genetics. You have been warned!

***And to be honest, in practice there is nothing simple about classical genetics. I’ve been forcefully reminded of this in an ongoing e-mail discussion that has gotten into the long term pedigrees of individual maize seeds.

**** I’ve often wondered if grad students assigned to such QTL projects have significantly higher than average drop out rates.