Under absolutely no circumstances should you take your hard drive full of data, walk into lab and drop it on the desk of some new grad student who decided to go to grad school because he loves plants (or whatever your favorite model organism is) and was a wiz at PCRs in his undergrad lab and tell him he’s now in charge of figuring out how to turn it into a paper.
Crunching gigabases of sequence data doesn’t require a lot of scientific acumen, but it does require a set of skills and abilities that very few incoming genetics or biology grad students have. Not only that, the traditional method of grad student learning — pestering post-docs, technicians, older grad students, even undergrads who have been in the lab for a while — is generally completely closed off. After all, if anyone in the lab already knew how to do the sorts of analyses you’re asking for, you’d have given to the project to them instead.
-A few grad students are going to take to computational biology like a fish to water. Before you know it you’ll have new analyses and paper drafts flooding your inbox — along with requests for more lots of expensive new datasets to follow up on the hints of cool things he’s finding in this one. However, these are rare exceptions and it is difficult if not impossible to spot which grad students will react this way in advance.
-A larger number will eventually drop out. Staring at a command line and struggling through introductory books on scripting languages (or worse yet, the documentation files command line sequence analysis tools) isn’t how they pictured spending their time in grad school. Now a fair number of grad students don’t make it through under any circumstances, but a lot of the people lost to sequence analysis had the potential to becomes good or even amazing scientists. They just weren’t computer people.
-But by far the most common fate will be a grad student who struggles along, getting advice on the exact parameters to use for specific programs or the meaning of specific error messages. Because his project isn’t very productive or exciting he’ll spend lots of time on side projects that let him handle actual plants and use his mad pipetting skills. And a few years down the road you’ll have a not-very-imaginative analysis of gene expression (on a dataset anyone could replicate for a fraction of the price) and have spent a lot of money (grad student stipends aren’t generous but over the years they add up) training an indifferent bioinformatician with little or no experience at the fundamental business of science: thinking up new ideas and creative ways to see if they’re right.
So what’s the alternative?
- Collaborate: Yes this will mean sharing credit. But more worryingly, picking a good collaborator is hard. Lots of pure computational labs are more interested in programming creative new tools than running existing ones. Sure they could get your entire analysis done in less than a week, but you’ll often spend month after month pestering them to actually get around to it. The key is to find a person or group interested in the answers to the same biological questions you’re looking into so they’ll be in a hurry to finish the analysis and see what the answers are.
- Hire someone who is already a computer person: this could be either a freshly minted post-doc from a computational lab or a professional programmer. This is the most expensive option, but it also gives you the most freedom. Once your programmer/post-doc is set up (which could include buying a pricey computer server or two), they’ll be able to crank through analyses in days or weeks that might take a biology grad student months or years, if at all. But that means you’re going to need to keep new data coming in or watch your expensive computational support spin their wheels and get into all sorts of trouble.
- If none of those options are appealing or feasible: hire a company to do the analysis for you! Yes it cuts against the grain to spend money on analysis, but it’s still cheaper than paying a grad student’s tuition and stipend for months, and you’ll get the results fast enough to publish them before they’re scooped by someone who waited a year and paid 1/10th the cost to generate an identical dataset with whatever the NEXT next-generation sequencing technology turns out to be.*