Author’s note: found in my “Unpublished Drafts” folder from April 12th 2012. Published May 12th 2015 without edits so as to accurately reflect my mindset at the time. Reflections of a much older and (if possible even balder) scientist forthcoming in a separate post.
I would much rather graduate with three papers cited twenty times each than twenty papers cited three times each.*
That fact drives how I do think about publishing my results:
If I wanted to published the maximum number of papers per dataset, I’d be worried about including too much data in any given paper because, once it was published other researchers might take that data and do the same analyses I was planning to do in a followup paper.
If I want my paper to be cited as much as possible though the opposite is true. I WANT my data to be as useful and accessible as possible because it will increase the number of other groups who will use that data, and cite my work when they publish their next paper.
It also changes the dynamics of when to publish. If I was trying to maximize my own publications, I would want to make sure I published before anyone else who could scoop me, but I also wouldn’t want to publish earlier than absolutely necessary to avoid being scooped. The longer I can go without publishing the data and analysis of paper #1, the larger the headstart I have at paper #2 which builds upon those data and analyses.
Since I want to be cited as much as possible, I want to publish as soon as possible. Full stop. Every month I don’t publish people go ahead with research projects without whatever small additional benefit my data and analysis could provide and that means fewer final citations for my papers.
*I don’t expect to achieve either goal in the time remaining to me (well I might hit the first if I count the giant genome paper where I was one or more than 100 authors and go off the much more rapidly updated citation counts of google scholar).
James’s travels Dec 2012-May 2015. Some trips not shown to increase legibility. Click to zoom in.
Figure generated in R starting from this tutorial at flowing data. Frankly I still prefer writing python code to produce R code algorithmically to programming in R directly, but I tend to be stubborn like that.
This will be my third year attending the Plant and Animal Genome conference in sunny San Diego. I’ve been fortunate enough to get to experience the conference in a bunch of different roles.
- My first year I was an overwhelmed young grad student with a poster and the silly idea that I could pack my schedule full of sessions all day every day without suffering melting of the brain. (You really need to pick and choose at PAG. It’s like an all you can eat buffet of science, it is all to easy to go overboard.)
- My second year I returned to PAG as an actual presenter giving two talks to packed sessions (which isn’t an endorsement of my own science I was sandwiched between successful scientists who also happened to be gifted speakers both times).
- And now in my third year I’ll get to see PAG through the eyes of an exhibitor. No, this doesn’t represent my post-PhD career path. This year PAG happened to fall in the break between filling my dissertation and the start of my next “real” job.
Anyway, my plane is about to board so I should wrap this up. To all the rest of you who are coming to the conference, hope you have a great conference, don’t push yourselves too hard, and drop me a line if you’d like me to hook you up with a free t-shirt. 😉
An old PhDComics explains the change in perspective which comes with graduating:
My transformation obviously isn’t complete yet though. Lab meetings with pizza sounds like a wonderful idea.
Over the last couple of years my posts here have really dropped off. It hasn’t been because I ran out of material or lost interest in blogging but simply because more and more of my time and energy have been consumed by a single goal… graduating.
So it gives me great pleasure to report that, as of December 14th (last Friday), I have reached that goal.
Behold! The lollipop handed to every newly minted Berkeley PhD when their thesis is accepted.
What was my thesis about you ask? Well I still don’t have a good elevator speech, so let me simply say that the first part of my thesis has to do with how plant genomes change over time and the second part demonstrated a new method for learning the function of pieces of DNA which don’t code for proteins but instead determine where and when neighboring genes will be turned on or off.
So what’s next? This whole site traces its origins back to travel posts I put up to let friends and family know how I was doing as I interviewed as various graduate schools. So I suppose there would be a fair bit of symmetry to shutting it down as I leave grad school, but I don’t want to do that. Now that I’m finished with my PhD, I’m looking forward to rediscovering the things I used to do for fun, and I remember writing updates here used to be a lot of fun.
On a more practical level, what comes next for me is a 2000 mile drive from California to the midwest (with all my worldly possessions packed into the back of my car) to visit family for the holidays. I am suddenly very conscious of the fact I haven’t driven on snow in more than four years. After that it’ll be onward to a post-doc.
If you’ve left an unanswered comment in the last six months or so and are still interested in me getting back to you, let me know.
For now… it is good to be back.
Because I get so many questions about this step in one of my published papers. (Well more accurately, my PI gets questions about this step and he sometimes forwards them on to me for an answer). The paper referred to in this guide is this one.
There are two completely different steps to reconstructing maize subgenomes: 1) putting together ancestral chromosome pairs 2) grouping one copy of each ancestral chromosome together into subgenome 1 and the other copy of each ancestral subgenome 2.
Ancestral chromosome pair reconstruction: (more…)
Success in grad school doesn’t come from working incredibly hard.
It comes from setting unrealistically fast deadlines for yourself. And then meeting them.
Sometimes that means working early mornings, late nights, and weekends. Sometimes it means coming up with a new approach, getting the results in three hours, and sneaking out of lab at 3:30. But the point is the results are what matter. If you can find ways to be unexpectedly productive you’re much less likely to burn out entirely than if you can only ever meet your own deadlines by burning the midnight oil at both ends (mixed metaphor intended).
Working hard for the sake of appearing to work hard (either to others or to yourself) is the surest road to burnout and lack of results.
P.S. Productivity goes up at least 5-fold when not also teaching. 😀
P.P.S. If the reagents you are working with are as old as you are, you need to worry. 😉 (That falls into the working hard but not getting results category.)
No chance of getting actual funding, just a silly course I signed up for this semester before I realized how crazy everything was going to be between TAing, trying to teach myself how to make RNA-seq libraries, and at least half a dozen collaborations (all of them urgent). I’ve been writing and analyzing and figure making for the past two days straight and turned in my final grant proposal at 10:50 tonight with a good 70 minutes to spare.
And all I can say is….
what a rush! This is why I love what I do for a living. Two days of improvising and lit-searching and throwing different approaches against the wall to see what would stick. And at in the last 24 hours I finally managed to turn my proposal into a project I would actually enjoy carrying out.
The only problem is that now I kind of want to spend next weekend doing the same thing. Ideally with a shot at actually getting some cash if I successfully sold people on the value of my research. It’s been a couple of months but I’ve finally been re-bitten by the science bug! Speaking of which, I should wrap this up. My alarm is set for 7 AM tomorrow so I can get to lab in time to squeeze in an RNA extraction before class. I’m taking yet another shot at building a proper sequencing library. Wish me luck!
R, you may be a confusing and hard to understand language where every package comes with its own set of quirks and foibles. You may make me feel less like a programmer and more like a not-very-well trained magician fumbling around for the right incantation to make magic happen.
But when you work, you do awesome things.
Sex specific splicing of a gene of unknown function of a gene syntenically conserved in all grass species.
With only four days work I was able to go from a giant pile of reads (from the still not properly appreciated Davidson 2011 The Plant Genome) to figures like the one above.
So what is the figure above showing you? One of a large number of genes which show a different pattern of splicing in male and female reproductive organs in maize.* The region “E8″ is usually treated as exonic in female reproductive tissues but is spliced out like an intron in male reproductive tissues. What does it mean (if anything)? I have no idea yet! But it would have been a real pain to try to re-invent the wheel for identifying these deferentially spliced genes in python. In R, once I figured out the right incantation, it’s practically plug and play for any gene you could possibly be interested in. Including the software for the (actually quite useful) visualization shown above.
So thank you R. What you do — once I can figure out how to make you do it — you do incredibly well.
*Maize makes it easy for us by separating female and male flowers into two entirely different organs (the ear and tassel respectively).
R. M., Hansey
C. N., Gowda
K. L., Lin
R. S., Leon
S. M., Jiang
C. R., 2011 Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes. Plant Genome 4
: 191–203. doi:10.3835/plantgenome2011.05.0015
Analyzed using the R package DEXSeq:
Anders S, Reyes A, Huber W. 2012 Detecting differential usage of exons from RNA-Seq Data. Unpublished. (Link is to a PDF)
I doubt this will be of interest to that many people but here’s the list of what I’m working on this sunday (each item is a separate project/collaboration):
- Downloading, decompressing and quality/adapter trimming more than 800 million RNA-seq reads (four full Hiseq 2000 lanes).
- Attempting to make my very own transcriptome assembly for a species where the genome is available but doesn’t look to be published anytime soon.
- Figuring out how to look at differential use of exons in maize between male and female floral structures. (Later on this will involve using some R packages. I’m not looking forward to that part. R always makes me feel like I’m coding with one hand tied behind my back).
The surprising part is that I’m not being held up by a lack of processors to throw at the problem (the usual problem in computation work), nor a limited supply of RAM (probably the biggest problem in bioinformatics specifically). Instead I’m hitting the limit of how fast all these various programs can read data off of hard drives and write results back. Right now I am waiting for a little surplus capacity to free up.
It’s hard to believe that eight months from now this will all be over. I started my education back in 1990. If they kept numbering years in school after high school I’d be a 20th grader right now. But my adviser has informed me that I need to have graduated by this December, so that’s what I have to make happen. Next week is my last as a graduate student instructor. This summer and part of the fall will be a mad sprint to finish up various projects and collaborations and get them written up for publications, then thesis writing, signing, and submitting are all that stand between me and (hopefully) the last degree I’ll ever need to earn.