James and the Giant Corn Rotating Header Image

April, 2012:

Pretend Grant Deadlines

No chance of getting actual funding, just a silly course I signed up for this semester before I realized how crazy everything was going to be between TAing, trying to teach myself how to make RNA-seq libraries, and at least half a dozen collaborations (all of them urgent). I’ve been writing and analyzing and figure making for the past two days straight and turned in my final grant proposal at 10:50 tonight with a good 70 minutes to spare.

And all I can say is….

what a rush! This is why I love what I do for a living. Two days of improvising and lit-searching and throwing different approaches against the wall to see what would stick. And at in the last 24 hours I finally managed to turn my proposal into a project I would actually enjoy carrying out.

The only problem is that now I kind of want to spend next weekend doing the same thing. Ideally with a shot at actually getting some cash if I successfully sold people on the value of my research. It’s been a couple of months but I’ve finally been re-bitten by the science bug! Speaking of which, I should wrap this up. My alarm is set for 7 AM tomorrow so I can get to lab in time to squeeze in an RNA extraction before class. I’m taking yet another shot at building a proper sequencing library. Wish me luck!

In which I apologize to R

R, you may be a confusing and hard to understand language where every package comes with its own set of quirks and foibles. You may make me feel less like a programmer and more like a not-very-well trained magician fumbling around for the right incantation to make magic happen.

But when you work, you do awesome things.

Sex specific splicing of a gene of unknown function of a gene syntenically conserved in all grass species.

With only four days work I was able to go from a giant pile of reads (from the still not properly appreciated Davidson 2011 The Plant Genome) to figures like the one above.

So what is the figure above showing you? One of a large number of genes which show a different pattern of splicing in male and female reproductive organs in maize.* The region “E8” is usually treated as exonic in female reproductive tissues but is spliced out like an intron in male reproductive tissues. What does it mean (if anything)? I have no idea yet! But it would have been a real pain to try to re-invent the wheel for identifying these deferentially spliced genes in python. In R, once I figured out the right incantation, it’s practically plug and play for any gene you could possibly be interested in. Including the software for the (actually quite useful) visualization shown above.

So thank you R. What you do — once I can figure out how to make you do it — you do incredibly well.

*Maize makes it easy for us by separating female and male flowers into two entirely different organs (the ear and tassel respectively).

Data from:

Davidson R. M., Hansey C. N., Gowda M., Childs K. L., Lin H., Vaillancourt B., Sekhon R. S., Leon N. de, Kaeppler S. M., Jiang N., Buell C. R., 2011  Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes. Plant Genome 4: 191–203. doi:10.3835/plantgenome2011.05.0015.

Analyzed using the R package DEXSeq:

Anders S, Reyes A, Huber W. 2012 Detecting differential usage of exons from RNA-Seq Data. Unpublished. (Link is to a PDF)

I/O Limited: Assorted Updates

I doubt this will be of interest to that many people but here’s the list of what I’m working on this sunday (each item is a separate project/collaboration):

  • Downloading, decompressing and quality/adapter trimming more than 800 million RNA-seq reads (four full Hiseq 2000 lanes).
  • Attempting to make my very own transcriptome assembly for a species where the genome is available but doesn’t look to be published anytime soon.
  • Figuring out how to look at differential use of exons in maize between male and female floral structures.  (Later on this will involve using some R packages. I’m not looking forward to that part. R always makes me feel like I’m coding with one hand tied behind my back).

The surprising part is that I’m not being held up by a lack of processors to throw at the problem (the usual problem in computation work), nor a limited supply of RAM (probably the biggest problem in bioinformatics specifically). Instead I’m hitting the limit of how fast all these various programs can read data off of hard drives and write results back. Right now I am waiting for a little surplus capacity to free up.

It’s hard to believe that eight months from now this will all be over.  I started my education back in 1990. If they kept numbering years in school after high school I’d be a 20th grader right now. But my adviser has informed me that I need to have graduated by this December, so that’s what I have to make happen. Next week is my last as a graduate student instructor. This summer and part of the fall will be a mad sprint to finish up various projects and collaborations and get them written up for publications, then thesis writing, signing, and submitting are all that stand between me and (hopefully) the last degree I’ll ever need to earn.