BACK TO THE FEATURE INDEX

The future of biological research is now coupled with the ability to create and deploy computational tools. In both the genomic and the postgenomic eras, the capacity to manage, analyze, and visualize the flood of genomic data that is being generated will influence the rate at which new biological laws and generalizations can be established. However, there is a serious shortage of people with the training required to successfully embark on this kind of research.

To ease this deficit, several significant steps have been taken at the University of Pennsylvania. First, degree programs have been established at the undergraduate, master's, and Ph.D. levels. Postdoctoral training is also available. Second, the Penn Center for Bioinformatics (PCBI) has been established to provide space and facilities for researchers and students in bioinformatics and computational biology. Third, a seminar series, a yearly retreat, and several journal clubs have been created to expose program participants to the broader world of bioinformatics.

In this article, I'll briefly outline the scope of the degree programs, summarize our experiences to date, and describe industry's response to our efforts. I'll conclude by placing the Penn programs within the bigger picture of bioinformatics education in the States, in hopes that by doing so, I'll be able to provide you practical advice you can use to get a foothold in the bioinformatics field.

Degree Programs

We designed the Penn programs around courses in biology, statistics, and computer science, with concepts from these disciplines pulled together in a two-course sequence of "capstone courses" in computational biology and bioinformatics. The programs were designed following the award of a research training grant from the National Science Foundation in 1994, which was succeeded in 1999 by a training grant in computational genomics from the National Institutes of Health. The program was originally opened to both Ph.D. and postdoctoral students with solid education in molecular, cellular, organismal, and evolutionary biology AND a strong foundation in mathematics, statistics, chemistry, and computer science.

We found, however, that most of the initial applicants were interested in the postdoctoral program. Relatively few of the Ph.D. applicants--most of whom had a B.S. in biology or biochemistry--appeared to have a rigorous enough mathematical background to prepare them for graduate studies in bioinformatics and computational biology. Due to this initial shortage of doctoral students, we put together undergraduate concentrations in computational biology within the departments of biology and of computer and information science. A parallel arrangement also now exists between the Departments of Biology and Mathematics for mathematical biology.

Enrollment in these concentrations is increasing, especially within biology, but computer science undergraduates appear to be less interested. (This is not surprising; computer science majors can expect a wide variety of job options when they graduate.) Moreover, the demands of this highly interdisciplinary program require a commitment and focus beginning in the freshman year that few students possess, and in any case, many undergraduates are not interested in pursuing a research career.

But 4-year undergraduate and 5-year Ph.D. programs could not address the immediate needs of the biotech industry. So in consultation with an external advisory board whose members come from about a dozen pharmaceutical and biotech companies, we established a very successful three-semester (1.5-year) Master of Biotechnology degree program at Penn, with a track in bioinformatics and computational biology. In addition to carefully chosen coursework that is tailored to complement the applicant's background, students also perform summer internships with the advising pharmaceutical and biotech companies or do rotations within the labs of PCBI faculty to gain practical experience.

Through all these programs--bachelor's, master's, and doctoral--we have gained useful experience in how to bring students up to speed in bioinformatics and computational biology as quickly as possible. The key insight is to carefully construct a sequence of undergraduate courses in introductory programming, data structures, discrete mathematics, and statistics to bring students with a background in biology to a point where they can take more advanced courses in computer science as well as the capstone courses within 6 months to 1 year.

Expectations and Experiences

Clearly, there are several factors that make it challenging to establish bioinformatics programs! First, the topic is highly interdisciplinary, requiring proficiency in three distinct fields. Until recently (and we sincerely hope this is changing), undergraduates in biology in the United States were required to take only a relatively weak version of calculus and not the more rigorous courses required of engineers and physicists. Similarly, undergraduates in computer science are typically required to take courses in physics, but not chemistry, biology, or statistics. We've found that the resulting deficiencies in their mathematical or biological preparation can make it difficult for individuals to switch to computational biology at a later point in their education or career.

Second, there are cultural differences between computer science-mathematics and biology. For example, a master's degree in computer science is a perfectly acceptable degree for those who want practical training leading to a job in industry. However, in biology the master's is often perceived as a "dropout" degree, a perception that--although it is changing--is still an obstacle.

How to Prepare, Part One

For those of you who are interested in pursuing formal training in bioinformatics and computational biology, there are a number of take-home messages. If you have proficiency in biology but not computer science, prepare ahead of time by taking courses in programming (e.g., C or Java--but emphatically NOT a course on software packages), discrete mathematics (algebra and logic), and statistics. If, on the other hand, you are a disquieted computer type who has been bitten by the thrill of the genomics revolution, take introductory courses in genetics, biochemistry, and statistics. Get comfortable saying lots of multisyllabic words and looking like you understand what they mean. Many programs will want to see some sort of rounded "track record" before admitting you, and the introductory courses mentioned above are taught almost anywhere. Some can even be found on the Web.

A third factor that has made developing the PCBI programs challenging is the need to provide hands-on, research-based experiences to the students. Computational biology is an experimental subject, and so our students must not only learn the theory behind techniques, tools, and algorithms but how to use them in practice. Tools developed by the students--database models or new data visualization tools, for example--must be practically useful. And the algorithms they construct must balance provable optimality with efficiency. This implies not only that coursework should include a practical lab component, but also that training internships in industry or research rotations in the labs of biology researchers be established and required of all students. This is especially important at the bachelor's and master's levels.

Finally, although federal money can be found for student support in doctoral and postdoctoral programs, the options for funding master's students are limited. Many of the students in our master's program are therefore studying part-time and are supported by their employer (typically a local pharmaceutical or biotech company). Others put themselves through the program at considerable personal expense. Clearly this is a field that would benefit greatly by expanded scholarship funding from government and industry.

Industry Response

Although we have not had time to observe the effectiveness of our undergraduate programs, industry response to the graduate programs has been good. All of our graduates (about 20 so far) have found jobs immediately upon graduation; most have had a variety of job options to choose from.

How to Prepare, Part Two

For those of you who would rather move toward a job in bioinformatics and computational biology without enrolling in a formal degree program, the same advice applies (especially for biologists). Taking courses in the evening or on a part-time basis is an excellent way of gaining skills that can be leveraged to switch out of the research wet lab and into the dry space of computers. However, the best advice is to do whatever it takes (a pay cut, or even no pay at all) to work in a lab that uses computational biology and bioinformatics. You'll be forced to learn the concepts and skills that you need on the fly (or be motivated to take those esoteric courses full of mathematical symbols or multisyllabic terms) at the same time as you experience the rush of working in a field that is still forming and full of promise.

The only criticism of the master's program that we have received is that students do not learn enough computer science. That is, there remains a shortage of people who receive the equivalent of a master's degree in computer science with adequate training in computational biology. Given the economics and the demand for bioinformaticians, this is unlikely to change in the near future, because few people will be willing to spend more than 2 years on a master's degree. However, we do encourage students in the master's program who evidence strong ability in computer science to take additional courses to complete a master of computer science degree.

Conclusion

We are continuing to refine the Penn bioinformatics programs on the basis of feedback from students, faculty, and industry. Clearly, though, developing a successful educational program in bioinformatics and computational biology requires a community effort between faculty across several academic departments, as well as researchers in industry. In bioinformatics, academia cannot be expected to serve as the sole training ground; industry must be a full player, providing student internships and fellowship opportunities for master's degree students.