James and the Giant Corn Rotating Header Image

Headed to PAG

This will be my third year attending the Plant and Animal Genome conference in sunny San Diego. I’ve been fortunate enough to get to experience the conference in a bunch of different roles.

  • My first year I was an overwhelmed young grad student with a poster and the silly idea that I could pack my schedule full of sessions all day every day without suffering melting of the brain. (You really need to pick and choose at PAG. It’s like an all you can eat buffet of science, it is all to easy to go overboard.)
  • My second year I returned to PAG as an actual presenter giving two talks to packed sessions (which isn’t an endorsement of my own science I was sandwiched between successful scientists who also happened to be gifted speakers both times).
  • And now in my third year I’ll get to see PAG through the eyes of an exhibitor. No, this doesn’t represent my post-PhD career path. This year PAG happened to fall in the break between filling my dissertation and the start of my next “real” job.

Anyway, my plane is about to board so I should wrap this up. To all the rest of you who are coming to the conference, hope you have a great conference, don’t push yourselves too hard, and drop me a line if you’d like me to hook you up with a free t-shirt. ;-)


Changes in Perspective

An old PhDComics explains the change in perspective which comes with graduating:

phd020802sMy transformation obviously isn’t complete yet though. Lab meetings with pizza sounds like a wonderful idea.



Over the last couple of years my posts here have really dropped off. It hasn’t been because I ran out of material or lost interest in blogging but simply because more and more of my time and energy have been consumed by a single goal… graduating.

So it gives me great pleasure to report that, as of December 14th (last Friday), I have reached that goal.


Behold! The lollipop handed to every newly minted Berkeley PhD when their thesis is accepted.

What was my thesis about you ask? Well I still don’t have a good elevator speech, so let me simply say that the first part of my thesis has to do with how plant genomes change over time and the second part demonstrated a new method for learning the function of pieces of DNA which don’t code for proteins but instead determine where and when neighboring genes will be turned on or off.

So what’s next? This whole site traces its origins back to travel posts I put up to let friends and family know how I was doing as I interviewed as various graduate schools. So I suppose there would be a fair bit of symmetry to shutting it down as I leave grad school, but I don’t want to do that. Now that I’m finished with my PhD, I’m looking forward to rediscovering the things I used to do for fun, and I remember writing updates here used to be a lot of fun.

On a more practical level, what comes next for me is a 2000 mile drive from California to the midwest (with all my worldly possessions packed into the back of my car) to visit family for the holidays. I am suddenly very conscious of the fact I haven’t driven on snow in more than four years. After that it’ll be onward to a post-doc.

If you’ve left an unanswered comment in the last six months or so and are still interested in me getting back to you, let me know.

For now… it is good to be back.



Guide to Reconstructing The Maize Subgenomes.

Because I get so many questions about this step in one of my published papers. (Well more accurately, my PI gets questions about this step and he sometimes forwards them on to me for an answer). The paper referred to in this guide is this one. 

There are two completely different steps to reconstructing maize subgenomes: 1) putting together ancestral chromosome pairs 2) grouping one copy of each ancestral chromosome together into subgenome 1 and the other copy of each ancestral subgenome 2.

Ancestral chromosome pair reconstruction: (more…)


Success in Grad School

Success in grad school doesn’t come from working incredibly hard.

It comes from setting unrealistically fast deadlines for yourself. And then meeting them.

Sometimes that means working early mornings, late nights, and weekends. Sometimes it means coming up with a new approach, getting the results in three hours, and sneaking out of lab at 3:30. But the point is the results are what matter. If you can find ways to be unexpectedly productive you’re much less likely to burn out entirely than if you can only ever meet your own deadlines by burning the midnight oil at both ends (mixed metaphor intended).

Working hard for the sake of appearing to work hard (either to others or to yourself) is the surest road to burnout and lack of results.

P.S. Productivity goes up at least 5-fold when not also teaching. :-D

P.P.S. If the reagents you are working with are as old as you are, you need to worry. ;-) (That falls into the working hard but not getting results category.)


Pretend Grant Deadlines

No chance of getting actual funding, just a silly course I signed up for this semester before I realized how crazy everything was going to be between TAing, trying to teach myself how to make RNA-seq libraries, and at least half a dozen collaborations (all of them urgent). I’ve been writing and analyzing and figure making for the past two days straight and turned in my final grant proposal at 10:50 tonight with a good 70 minutes to spare.

And all I can say is….

what a rush! This is why I love what I do for a living. Two days of improvising and lit-searching and throwing different approaches against the wall to see what would stick. And at in the last 24 hours I finally managed to turn my proposal into a project I would actually enjoy carrying out.

The only problem is that now I kind of want to spend next weekend doing the same thing. Ideally with a shot at actually getting some cash if I successfully sold people on the value of my research. It’s been a couple of months but I’ve finally been re-bitten by the science bug! Speaking of which, I should wrap this up. My alarm is set for 7 AM tomorrow so I can get to lab in time to squeeze in an RNA extraction before class. I’m taking yet another shot at building a proper sequencing library. Wish me luck!


In which I apologize to R

R, you may be a confusing and hard to understand language where every package comes with its own set of quirks and foibles. You may make me feel less like a programmer and more like a not-very-well trained magician fumbling around for the right incantation to make magic happen.

But when you work, you do awesome things.

Sex specific splicing of a gene of unknown function of a gene syntenically conserved in all grass species.

With only four days work I was able to go from a giant pile of reads (from the still not properly appreciated Davidson 2011 The Plant Genome) to figures like the one above.

So what is the figure above showing you? One of a large number of genes which show a different pattern of splicing in male and female reproductive organs in maize.* The region “E8″ is usually treated as exonic in female reproductive tissues but is spliced out like an intron in male reproductive tissues. What does it mean (if anything)? I have no idea yet! But it would have been a real pain to try to re-invent the wheel for identifying these deferentially spliced genes in python. In R, once I figured out the right incantation, it’s practically plug and play for any gene you could possibly be interested in. Including the software for the (actually quite useful) visualization shown above.

So thank you R. What you do — once I can figure out how to make you do it — you do incredibly well.

*Maize makes it easy for us by separating female and male flowers into two entirely different organs (the ear and tassel respectively).

Data from:

Davidson R. M., Hansey C. N., Gowda M., Childs K. L., Lin H., Vaillancourt B., Sekhon R. S., Leon N. de, Kaeppler S. M., Jiang N., Buell C. R., 2011  Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes. Plant Genome 4: 191–203. doi:10.3835/plantgenome2011.05.0015.

Analyzed using the R package DEXSeq:

Anders S, Reyes A, Huber W. 2012 Detecting differential usage of exons from RNA-Seq Data. Unpublished. (Link is to a PDF)


I/O Limited: Assorted Updates

I doubt this will be of interest to that many people but here’s the list of what I’m working on this sunday (each item is a separate project/collaboration):

  • Downloading, decompressing and quality/adapter trimming more than 800 million RNA-seq reads (four full Hiseq 2000 lanes).
  • Attempting to make my very own transcriptome assembly for a species where the genome is available but doesn’t look to be published anytime soon.
  • Figuring out how to look at differential use of exons in maize between male and female floral structures.  (Later on this will involve using some R packages. I’m not looking forward to that part. R always makes me feel like I’m coding with one hand tied behind my back).

The surprising part is that I’m not being held up by a lack of processors to throw at the problem (the usual problem in computation work), nor a limited supply of RAM (probably the biggest problem in bioinformatics specifically). Instead I’m hitting the limit of how fast all these various programs can read data off of hard drives and write results back. Right now I am waiting for a little surplus capacity to free up.

It’s hard to believe that eight months from now this will all be over.  I started my education back in 1990. If they kept numbering years in school after high school I’d be a 20th grader right now. But my adviser has informed me that I need to have graduated by this December, so that’s what I have to make happen. Next week is my last as a graduate student instructor. This summer and part of the fall will be a mad sprint to finish up various projects and collaborations and get them written up for publications, then thesis writing, signing, and submitting are all that stand between me and (hopefully) the last degree I’ll ever need to earn.


qTeller part 2: Eye candy!

qTeller isn’t just for generating spreadsheets full of data on genes within an genomic region. It can also visualize published expression data for a single gene. For example here is the expression pattern of a gene called golden plant2 involved in regulating photosynthetic development in maize which was first  described all the way back in 1926 in an article in the american naturalist:

As you can see golden plant2 is expressed at high levels in photosynthetic tissues and not expressed at all in tissues like roots, endosperm, and pollen. Do you know how long it would have taken me to profile the plant-wide expression pattern of a gene this comprehensively by isolating RNA from different tissues using qPCR? WEEKS! Do you know how long it took for me to get the same level of insight with qTeller? 90 seconds!

Do you know how long it’ll take you to regenerate this same analysis? 30 seconds. Just click this link. There have been so many awesome RNA-seq papers coming out recently for maize. I know when I arrive in Portland on Thursday for the Maize Genetics Conference I’m going to see a whole lot more even bigger/better RNA-seq datasets which people haven’t finished writing up yet. Some of these datasets have been on posters since the very first maize meeting I went to back in 2009 when I was a wide-eyed first year and may _never_ get published.* But others will be published weeks or months from now, making this visualization all the more powerful.

But for now, MORE EYE CANDY:

Anther ear1

Expression of anther ear1, a mutant in the gibberellic acid biosynthetic pathway

Link to regenerate analysis


Link to regenerate analysis. 

Glossy1 mutants change the type of wax produced on the leaves of developing maize seedlings, so it makes sense that the gene shows high expression in both maize seedlings and mature leaves. I can even sort of explain away the high expression in developing seeds and embryos since the the primordia which will eventually become the first leaves of the next generation of corn plants are beginning to form, But why in the world does glossy1 show such high levels of expression in anthers?

*Here is the relevant excerpt from my previous rant on data analysis:

I recently did the math on a PLoS Genetics paper published in late 2009 based on on a single in-depth analysis of RNA-seq comparisons of mutant and non-mutant siblings. Today we could generate the same dataset, with twice the depth of sequencing for less than $1000 dollars. (INCLUDING regent costs). The takeaway lesson here: just because your dataset was expensive to generate doesn’t mean you don’t have to worry about the competition stealing the glory if you take more than a year to publish. 


qTeller: an easier way to find candidate genes

Hunting for good candidate genes is something biologists spend a lot of their time doing. Here are a couple of hypothetical examples:

A) Suzzy the grad student is mapping a recessive mutant which makes the pollen of cornplants shrivel up and die. By examining a bunch of known genetic markers in plants with dead pollen and normal pollen producing siblings of those plants she has narrowed the location of the gene responsible for her trait down to a region of only a couple of megabases on the fifth chromosome of maize. Since the whole maize genome contains over 2,300 megabases of sequence that means she’s already ruled out 99.9% of the genome. But her region still contains, say, a dozen genes and she needs to know which one she should check first to see if mutation in it is responsible for her mutant phenotype.

B) Johnny is another grad student. He wants to understand how corn plants genetically regulate how wide their leaves will grow to be. By measuring a lot of plants descended from two parents, each with known genotypes, he can identify regions of the genome where inheriting information from one parent or the other seems to be correlated with either wider or narrower leaves. He calls these regions quantitative trait loci (or QTLs). Now he has picked the genetic region that seems to have the biggest effect, and he wants to know what gene within the region is actually responsible for the effect.

There are a number of ways for both Johnny and Suzzy to narrow down their lists to the genes most likely responsible for the changes they are each observing in corn plants: (more…)