Jonathan Eisen has a new post up on his blog “The Tree of Life” tearing apart a letter from the Ecological Society of America (link is to a pdf). In the letter, the ESA comes out against the government requiring that papers funded by federal research dollars must be made available for free online after a set amount of time has passed since publication. Jonathan Eisen, one of the strongest voices in the open access movement, naturally finds this position disturbing. But why would a scientific society come out against the open dissemination of information? Here is my — entirely speculative — explanation. (more…)
First of all, apparently I’ve been spelling Pigeonpea wrong. The random “d” I was inserting comes from “pidgin.” But fortunately that’s not what the scientific feud is about.
Instead, it turns out that there were actually two independent assemblies of the pigeonpea genome published in separate journals within a couple of weeks of each other.
One, which I’ve been talking about, was published in Nature Biotechnology (impact factor 31) on November 6th. This pigeonpea project was run by ICRISAT (one of the CGIAR centers) with much of the actual sequencing and informatics contracted out to BGI (the Beijing Genomics Institute).
But what I didn’t know was that a second pigeonpea assembly was published back on October 25th in the “JOURNAL OF PLANT BIOCHEMISTRY AND BIOTECHNOLOGY” a journal far fewer scientists will recognize the title of (impact factor 0.41). This genome was put together by a group at the Indian Council of Agricultural Research (ICAR).
Confused by all the acronyms yet?
Alright, I’ll drop all of them for the remainder of this post:
- Has a longer history of working on pigeonpea genomics
- Published first (if just barely)
- Published in a much higher impact journal
- Has more total assembled sequence*
Seems simple enough. And just the sort of thing that is bound to happen more and more as the cost of genome sequencing continues to drop. Unfortunately there seems to be a lot of bad blood between the research groups and all sorts of stuff (like who is more of a real indian) is getting dragged in. Check out the comments section on this article.
I really only have two thoughts on the subject:
- The research community really will benefit if these two research groups can make peace with each other and merge their data into a combined assembly for version 2 of the pigeonpea genome. Neither assembly is all that great yet, and the two genomes were sequenced using different technologies: 454 sequencing (fewer longer reads) and Illumina sequencing (more shorter reads). A merged assembly could capture more of the total genome and — more importantly in my opinion — maybe get a larger fraction of the current sequence placed and orientated within the pseudomolecules. Right now less than 250 megabases of the 600-odd megabase genome are placed within chromosomes. The other 350 megabases are present as over 100,000 floating contigs, many as small as 103 bp of sequence.
- These situations are terrible for everyone directly involved, all of whom are probably afraid they will not get the credit they expected for the incredibly hard work of sequencing a genome. But the same mess is good for both the broader scientific community and the world as a whole. Incidents like the current one with pigeonpea or the competing chocolate genome projects and the three cucumber genomes drive home one message. Genome projects can be scooped. You can’t afford to sit on your data for months or years… you need to write up your findings and make your data available fast. As a result science moves faster (good for us scientists trying to publish) and in the long term important discoveries with the potential to help people are made faster (good for the whole world).
*In the article linked above the author states: “The major difference between the two sequencing projects is that while the ICRISAT-led team has assembled 605.78 Mb out of the 833.07 Mb (about 72.5%) of the genome, the ICAR team has captured 511 Mb (about 61%).” I haven’t had a chance to look at the ICAR assembly yet, but to get to 605 Mb the ICRISAT assembly is including contigs as small as 103 base pairs, which is a uselessly small piece of DNA from a genomics or genetics perspective. So comparing raw “sequence assembled” numbers without asking “how many pieces? (and how big were they?)” clearly will not give the whole picture.
Not that grad school isn’t loaded with plenty to begin with, but here’s one I hadn’t considered before:
Yes, for many people, a college town is a rather idyllic place. There is a specific subpopulation in these college towns, however, for whom the experience becomes utterly hopeless. This subpopulation: those who move to college towns, are not college-aged, and arrive without a significant other. Meet those requirements, and you’re basically hosed until you escape. It is the bog of eternal singlehood.
Well, at least as a consolation, you will find great friends, for whom your sad, lonely, single self will serve as a reminder of why they need to stay committed to their own relationships.
Yikes! I would say another big part of the puzzle is that grad student lifestyle (regardless of where you live) isn’t friendly to attempts to get out and meet new people. Take the fact that I’m sitting in the office updating my blog while waiting for one last genome to finish loading into CoGe (pigeonpea!) at seven-thirty on a Friday night.*
And that means even if you did happen to come into grad school with a significant other, things can get messy (as explained by the Genomic Repairman):
So when work builds up, I tend to act less human and more like a robot and just grind away. And unfortunately I take on a sort of tunnel vision when I’m grinding. … If its not directly related to whats happening now it gets place on the backburner, which is fine if its mundane paperwork or BS emails that need to be sent out. Its not good if its your relationship and its going to cause tension.
I think its hard for a significant other who doesn’t do science to appreciate what we do. We can’t check out of work at 5pm and not worry about it to the other day. The stakes are too high in the game we play and you must be invested in your work. I am. I wake up at night with ideas, fears, and concerns. Did I do the transfection right? Am I being scooped?
*Just to be clear no one MADE me work late on the friday after thanksgiving. It is just really easy to get engrossed in science — at least when your research is going well — lose track of time, and ignore the rest of your life.
I came across a fascinating article on twitter while I was procrastinating this morning. It describes a class at Davidson College where, rather than having to write research papers for final projects, students instead wrote or updated wikipedia articles on topics within the field of the class (psychology in this case). Now there are many great things about wikipedia, but many of the entries on modern scientific subjects (stuff that you wouldn’t be able to find in high school/101 level textbooks) are woefully out of date and/or badly written.
Example: here is wikipedia’s list of published plant genome sequences. It lists 16 published plant genomes. But there are actually 25 published plant genome sequences at the moment.*
Despite that, when I can’t figure out how to track down a piece of information using literature searches, wikipedia is usually one of my first fallback solutions to at least get a broad overview of some subject I know very little about. The idea of having undergrads write up the information they’ve been learning in class to make it available to the broader public really grabbed my imagination. It sounds like students really like it too:
Students have been excited from the very first day I described the project. Many, many students have told me they particularly appreciate that their work will be read by more than “just the professor.”
I’m just a lowly grad student and can’t motivate large numbers of undergrads with the threat of having to write a traditional research paper, a necessary prerequisite for starting a project like this. But it’d be a lot of fun to help out with such a project in plant biology… ::mentally runs through the list of professors he might be able to pitch the idea to::
Yeah, I’ve got nothing. But I do know this is the first time I’ve actually been sad that grad students in my department don’t get the chance to teach independent courses.
*The missing ones are: thellungiella, Brassica rapa, papaya, castor bean, cannabis, strawberry, pigeon pea, lotus, and medicago
Graduate students are, almost by definition, atypical students as undergraduates. In most cases, the types of people who enroll in graduate work were exceptionally bright, hardworking undergraduate students. As exceptional undergraduates, the people who eventually go on to graduate studies probably get very good at disregarding warnings. When, as an undergraduate, an instructor issued routine warnings to the class, the grad-school-bound student might have gotten very used to ignoring the sorts of admonitions that pervade the undergraduate experience: “My bibliographies are always perfect, and I turn everything in on time, so this warning to make sure that my APA formatting is correct and to have my paper turned in by Monday is nothing to worry about.”
By the time they arrive at graduate school, and even if they are years removed from their undergraduate education, most grad students have been conditioned to see themselves as an exception, and as exceptional. So, when they begin to hear warnings about the realities of the job market in graduate school, the old conditioning kicks in, and the old thinking, so trustworthy before, also kicks in: “This doesn’t apply to me. My intelligence and hard work will see me through, just as they always have.” The problem, of course, is that not everyone can actually be the exception. People will be disappointed, their studies abandoned, their dreams unfulfilled, their future paths unclear.
What is so impossible for many graduate students to understand is that everybody in their cohort is just as smart and hardworking as they themselves are. At the graduate level, the smarts and diligence that once set students apart from their undergraduate peers will no longer set them apart, but merely allow them to keep up. It is almost impossible for many beginning graduate students to grasp that having above average intelligence and an unimpeachable work ethic will mean only that they are average graduate students. That’s quite a shock to some people.
The whole article is definitely worth a read. I can only speak to myself — and I’ve never been good about getting papers handed in on time — but aside from that I could definitely relate to the mindset described here.
A few weeks ago I was reading an article which claimed before the recession seven times as many PhDs were awarded in the biological sciences as there were openings in tenure track positions. Of course in between finishing grad school comes years of post-doc work, but in the end PhDs in must equal PhDs out.
So assuming every PhD graduate wants to be a professor (probably not true) that means even after making it past admissions committees and qualifying exams and thesis defenses, these newly minted PhDs face an 86% washout rate in their quest for a faculty position.
Eighty-seven percent. Let’s put that in context. These are the numbers I turned up with some quick googling:
- Roughly 10% of marine recruits drop out during basic training
- Roughly 55% of people going through the training to become army rangers drop out
- In an average year 70% of the people who start training to be Navy Seals (the folks they sent in when they finally found Osama bin Laden) don’t make it to the end.
- To actually find a training regime with a higher dropout rate than the road from PhD to Professor I had to go to the wikipedia page of the Pararescue Jumpers — the guys who jump out of the rescue helicopters into enemy territory to rescue the wounded. Their washout rate in 90%.
It’s hard work keeping up with a blog while being a grad student, but some people find the time to manage it.
Here’s a new site: ProSeed with Science written by Greg, the same guy who used to write Pie-ence. First few posts look interesting (lots of sunflower stuff, a crop with a genome even bigger than that of maize!), and he’s already made it past the three-post line before which most new blogs die a quick death of neglect. Check it out!
QTL stands for quantitative trait locus. Which raises even more questions. What is a quantitative trait? What is a locus?.
- A locus is simply a region within a genome. Anything from a part of a single gene to a large hunk of a chromosome.
- A quantitative trait is one where different individuals vary continuously (like height or weight) rather than falling into discrete categories (like whether a person has blue or brown eyes*).
A QTL is simply a part of the genome that has been show (using complicated statistical tests) to influence a quantitative trait like height. For example, people with one particular region of chromosome 8 tend to be slightly thinner than people with other versions.** Now a lot of qualities we’re very interested in as a society turn out to be quantitative traits. I’m not even going to touch the implications for human genetics, but within plant biology lots of the things people are really interested in changes, from flowering time, to drought or disease resistance, to the big kahuna of them all YIELD, are quantitative traits.
How are new QTLs discovered? It’s not as simple as classical genetics where you can simply run a mutant screen, pull out individual that look weird in a way that seems interesting, and identify the gene which was mutated to create the change you observed.*** Instead a researcher has to measure their specific trait in a bunch of individuals (easily done for something like height, less easily done for something like number of root hairs per centimeter of root or trichomes per leaf****) and then compare those measurements to a bunch of information about the genomes of each of those individuals. If the average height of all the individuals with version A of some part of the genome is higher than the average height for all individuals which have version B of that same part of the genome and that difference is significant after a whole bunch of statistical tests, then that region is a QTL.
Do all that and congratulations. You’re done now. You can go publish a paper describing your discovery of QTL controlling whatever trait you just measured! Depending on the species, the trait, and how many (and how small) the QTL you found, that paper could be anywhere from a major finding to something buried in a never-heard-of-the-name-before journal. QTLs are one of those weird case (like cell phones) where smaller is better.
Why? Because the logical next step, after identifying a QTL, is to figure out what it is about that region which influences the measured trait. If the QTL in question is too large, that could mean trying to take a list of dozens or hundreds of genes and, somehow, devising a test to prove: It’s this one! Gene AT1G15210 helps regulate height (or root hairs, or trichomes, or whatever it is being studied).
If you’d like to check out an example of what an actual QTL paper looks like, I really enjoyed a recent one in G3 (G3 is open access, so everyone should be able to access this) at measured the development of tassel like outgroups on the end of maize ears. I’ve run across this a few times in the field back when I did actual maize genetics and always wondered what was going on genetically to create such weird looking plants. I still don’t know for sure, but now I know there are real geneticists out there working to discover the answer:
* Yes, if you’ve spent any length of time staring into a number of women’s eyes (or men’s for that matter) you’ll know there’s a great deal of variation within those categories, but the point is there ARE obvious categories for eye color, while any attempt to group people by weight or height would rely on essentially arbitrary cut offs.
**This statement is used as example. I know absolutely NOTHING about human genetics. You have been warned!
***And to be honest, in practice there is nothing simple about classical genetics. I’ve been forcefully reminded of this in an ongoing e-mail discussion that has gotten into the long term pedigrees of individual maize seeds.
**** I’ve often wondered if grad students assigned to such QTL projects have significantly higher than average drop out rates.
Read the whole list here, but I want to highlight #8 in particular:
Evolution has a requirement that things work, not that it’s an elegant engineering solution. Expect jury rigged systems which can be bewildering in their complexity.
This was just on slashdot, so I imagine many will have already read about it, but for those who haven’t, here’s a wonderful metaphor to understand the difference between how scientists (biologists anyway) code, and how professional computer people (some of whom as also scientists) do:
Scientists see their software as a kind of exoskeleton, an extension of themselves. … The software may do heavy lifting, but the scientists remain actively involved in its use. The software is a tool, not a self-contained product.
Programmers see their software as something they will hand over to someone else, more like building a robot than an exoskeleton. Programmers believe it’s their job to encapsulate intelligence in software. If users have to depend on programmers after the software is written, the programmers didn’t finish their job.