James and the Giant Corn Rotating Header Image

Bad Blood On Pigeonpea

First of all, apparently I’ve been spelling Pigeonpea wrong. The random “d” I was inserting comes from “pidgin.” But fortunately that’s not what the scientific feud is about.

Instead, it turns out that there were actually two independent assemblies of the pigeonpea genome published in separate journals within a couple of weeks of each other.

One, which I’ve been talking about, was published in Nature Biotechnology (impact factor 31) on November 6th. This pigeonpea project was run by ICRISAT (one of the CGIAR centers) with much of the actual sequencing and informatics contracted out to BGI (the Beijing Genomics Institute).

But what I didn’t know was that a second pigeonpea assembly was published back on October 25th in the “JOURNAL OF PLANT BIOCHEMISTRY AND BIOTECHNOLOGY” a journal far fewer scientists will recognize the title of (impact factor 0.41). This genome was put together by a group at the Indian Council of Agricultural Research (ICAR).

Confused by all the acronyms yet?

Alright, I’ll drop all of them for the remainder of this post:

One group:

  • Has a longer history of working on pigeonpea genomics
  • Published first (if just barely)

The other:

  • Published in a much higher impact journal
  • Has more total assembled sequence*

Seems simple enough. And just the sort of thing that is bound to happen more and more as the cost of genome sequencing continues to drop. Unfortunately there seems to be a lot of bad blood between the research groups and all sorts of stuff (like who is more of a real indian) is getting dragged in. Check out the comments section on this article.

I really only have two thoughts on the subject:

  1. The research community really will benefit if these two research groups can make peace with each other and merge their data into a combined assembly for version 2 of the pigeonpea genome. Neither assembly is all that great yet, and the two genomes were sequenced using different technologies: 454 sequencing (fewer longer reads) and Illumina sequencing (more shorter reads). A merged assembly could capture more of the total genome and — more importantly in my opinion — maybe get a larger fraction of the current sequence placed and orientated within the pseudomolecules. Right now less than 250 megabases of the 600-odd megabase genome are placed within chromosomes. The other 350 megabases are present as over 100,000 floating contigs, many as small as 103 bp of sequence.
  2. These situations are terrible for everyone directly involved, all of whom are probably afraid they will not get the credit they expected for the incredibly hard work of sequencing a genome. But the same mess is good for both the broader scientific community and the world as a whole. Incidents like the current one with pigeonpea or the competing chocolate genome projects and the three cucumber genomes drive home one message. Genome projects can be scooped. You can’t afford to sit on your data for months or years… you need to write up your findings and make your data available fast. As a result science moves faster (good for us scientists trying to publish) and in the long term important discoveries with the potential to help people are made faster (good for the whole world).

*In the article linked above the author states: “The major difference between the two sequencing projects is that while the ICRISAT-led team has assembled 605.78 Mb out of the 833.07 Mb (about 72.5%) of the genome, the ICAR team has captured 511 Mb (about 61%).” I haven’t had a chance to look at the ICAR assembly yet, but to get to 605 Mb the ICRISAT assembly is including contigs as small as 103 base pairs, which is a uselessly small piece of DNA from a genomics or genetics perspective. So comparing raw “sequence assembled” numbers without asking “how many pieces? (and how big were they?)” clearly will not give the whole picture.


  1. Good points, James, to which I would add another question: At what point should scientists be “allowed” to claim that they have “sequenced” a genome? Would it be useful to have an arbitrary limits, X% of the DNA sequenced, and Y% of that located on chromosomes?

    That might stop both the hype and the feuding.

    1. James says:

      Hi Jeremy,

      I think it would be great if there were standards for what did and didn’t qualify as sequencing a genome. Things like “At least 90% of genes with expression evidence should be ordered along chromosomes” and “Only assembled scaffolds at least 10 kilobases long can be counted towards the total percent of the genome that is covered by the assembly.” This would instantly shrink the number of published plant genomes, but it would give people more incentive to continue working on their assemblies and annotations until they had a dataset that would be more useful for everyone from comparative genomicists to breeders working in the field.

      The problem is that I can’t think of any person or organization which could develop and impose standards which journal editors would actually listen to. And if you can already publish in a journal like Nature Genetics or Nature Biotechnology saying you’ve sequenced the genome of species X, there is much less incentive to put in the work to create a better and more useful version of the genome.

  2. […] pigeonpea genome sequencers? Who knew. Well spotted, […]

  3. […] pigeonpea genome sequencers? Who knew. Well spotted, […]

  4. maddox says:

    hmmm, painful reading. Someone once told me that “It’s amazing how much you can get done if you don’t care who gets the credit.” I don’t remember who that was, but the collaborative working approach it fosters sure gets the work done.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: