NEXT WAVE STAFF

UNITED STATES

To follow up on our recent bioinformatics feature Next Wave invited Stanford University researcher, David Botstein, to chat with Next Wave readers live online.

The question and answer session, which was held on Tuesday, September 12, was a huge success. In fact, we understand that many who tried to reach the Next Wave site during the chat were unable to do so. Moreover, Botstein did not have time to answer many of the questions that were posed.

Next Wave apologizes profusely for any inconvenience (or angst!) that the overwhelming interest in this chat session may have provoked. And to back up that sentiment with action, we have posted here the entire transcript of the conversation with Botstein. Our hope is that by doing so we will enable you relive vicariously the excitement of the chat itself and--more importantly--share in the information that was exchanged.

NEXT WAVE: Welcome to Next Wave's chat event with bioinformatics guru David Botstein. David is here to talk about bioinformatics...Funding, education, jobs, and the future.

DR. BOTSTEIN: Hello. I am happy and honored to be here. I will do what I can to answer the questions. I would like to stick to the subject of bioinformatics if at all possible. So lets get started!

QUESTION: Could you define bioinformatics for us? Is it biologists learning computers or geeks learning biology? Why does Russ Altman say there isn't universal agreement on the definition?

DR. BOTSTEIN: It is both. Biologists learning computers and computer literate folks learning biology. If you want to know what Russ thinks you should ask him. But it is true. Bioinformatics is a sort of code word for biologists having to deal with very large amounts of information. This information could be genomic in nature or it could be images or it could be diagnostic tests or even clinical trials. One big area, besides genomics per se, is structural biology. And, in fact, a very large fraction of what is now called bioinformatics has that origin.

QUESTION: How does one transition from traditional fields of engineering, biomedical engineering and computer science to bioinformatics? How long does it take to be actively conducting research in bioinformatics fields?

DR. BOTSTEIN: There is no standard track at this point to becoming a professional in the field. As I have said before, many come from the field of structural biology. They have the advantage that they are already familiar with biochemistry and the language of biology, generally. However, I should say that there are well known examples of biologists, biochemists, or molecular biologists who have learned computing and have become leaders in the field. Somewhat surprisingly, there are fewer examples of pure computer scientists who have done the same. It now seems to me that everyone involved underestimates how much effort it takes to learn the language of biology. Unix, PERL, and C all seem to be easier to learn than biology.

QUESTION: I am currently in an M.D./Ph.D. program and I intend to be finished in 3 years. What kind of position might be available for someone like me, and what postgraduate education would be necessary (e.g. residency, post-doc)? What is the best way to find such a position?

DR. BOTSTEIN: My first thought is to say that you are already overeducated! But joking aside, you should take the opportunity now to take courses in programming and in probability and statistics. These things do not get easier to learn with advancing age. Once you have comfort in programming, I think it will become clear to you and your potential employers that this is where you want to go. I would say there is no point in doing clinical traning or experimental post-doctorate work. You should find a post-doctoral position within the area of bioinformatics.

QUESTION: I am a postdoctoral Fellow in bioinformatics at CSHL; I have a Ph.D. in statistics; I am developing tools/algorithms for general use by molecular biologists. I would like to work more directly on problems in biology rather than working on tools those biologists would use. I would greatly appreciate your valuable advice regarding this transition...

DR. BOTSTEIN: I urge all people in your position to do post-doctoral work in a laboratory situation. Find a place where people are generating huge amounts of data themselves and become, at least temporarily, one of the generators of the data. I think that the future of biology, especially for young people, is in the area of being bilingual in the computational and mathematical world as well as the biological world."

QUESTION: What are some of the standardization issues in bioinformatics and how are they being overcome?

DR. BOTSTEIN: There are standardization issues in all computational fields, all the time. The main standardizer, and the only effective one, seems to be the mass market. Top down efforts at standardization have uniformly failed. So in our field now, the browser is king because everyone does HTML or the progeny of HTML. That way it doesn't matter what operating system you have, etc. The only other major factor, I think, is power for the producers and convenience for the users. That is the reason that PERL has become so popular.

QUESTION: The UNIX operating system is the workhorse of molecular biology. Will it continue to be?

DR. BOTSTEIN: UNIX is likely to be here for the forseeable future. First of all, it is getting, in the form of LINUX, increasing market share for servers of all kinds. The reasons are stability and that UNIX is already a standard. Indeed, all the other operating systems are, to a greater or lesser degree, derived from UNIX. So being really facile with UNIX is a very good thing for anybody who does anything with computers.

QUESTION: What kinds of software tools must still be developed?

DR. BOTSTEIN: Virtually everything that we now do is being done with software that is inadequate in some way or another. Almost none of the software in use in bioinformatics is robust in the sense of commercial software. It is fragile and full of quirks and complications in the best case. For many applications, even applications that one might think are really simple, like comparing images, there is not even a clue as to how to proceed."

QUESTION: I am the M.D./Ph.D. student who asked the earlier question. I have over 15 years of programming experience as well as extensive sequence analysis experience. Where, specifically, should I look to find an appropriate post-doc? Where do I get the information?

DR. BOTSTEIN: Read the literature! Find the kind of work that appeals to you and ask around. Both academic and commercial groups uniformly are always on the lookout for real talent in this area.

QUESTION: I am an undergraduate (senior) looking at grad school programs in bioinformatics and computational biology. Since such programs seem to just be getting started at many schools, what should I be looking for?

DR. BOTSTEIN: At the current moment, I would look for strong research groups in bioinformatics. As above, reading the literature is a good way to get an idea. Any place that has strong bioinformatics research probably has worked out with its university programs that can encompass the diversity of things that you will have to do in the way of course work. Remember that the most important component of graduate school is the research experience and learning to be a professional researcher."

QUESTION: Is there funding for biologists to study computer science in preparation for entering bioinformatics?

DR. BOTSTEIN: Yes. One of the recommendations of the BISTI report that the NIH has agressively taken up, in many of its institutes, is funding specifically aimed at training in this area. I think that most universities capable of mounting a serious program in this area either have or have applied for training grants, etc.

QUESTION: Is there a special ethical paradigm that must be adopted before undertaking a career in bioinformatics?

DR. BOTSTEIN: I don't think so. I think that the producers of the information, the analysts of the information, and the conveyers, through the public and individuals all have essentially comparable ethical responsibilities. Their tasks may be different, the ethical impact may be different, but the responsibility is shared by all of us.

QUESTION: What are some of the intellectual property issues with bioinformatics and how are you dealing with them?

DR. BOTSTEIN: The whole gamut of issues around software as intellectual property are not unique to bioinformatics software. They apply as well to MS Word as they do to GCG. Pirating software that is commercial in origin is still piracy if it is in bioinformatics. On the other hand, there is considerable controversy over the intellectual property status of databases. I am not an expert on that. Data that are generally available in the public may nevertheless become proprietary if transformed into a database that is copyrighted. Breaking into copyrighted databases is not ok simply because the data is available publicly in another form. One has to get the public data from the public sources. There are, of course, disagreements over whether any genomic data should be private in any form and I don't know what can be done about that, except to discuss the issue.

QUESTION: Do you foresee new government regulations being created based on the mapping of genes?

DR. BOTSTEIN: My record of prophecy about what the government will to is too poor to make a sensible answer to this question. Much of the concern about genetic information has its origin in societal issues like insurance, employment, and abortion that are way beyond the scope of bioinformatics.

QUESTION: What diseases do you think will be the first to be significantly reduced as a result of genetic mapping?

DR. BOTSTEIN: That is a good question! I don't really know the answer. It certainly was believed at one time that the ability to predict in utero some diseases would result in reproductive choices that would impact severely on the incidence of certain inherited diseases. That is true for some and not others. I am not up on the statistics to give you a proper answer. In terms of therapy, the best direct gene therapy results were the recent French study on an immune deficiency, which got a lot of publicity six months ago. Those results are in very strong contrast with previous experience.

QUESTION: As a result of this mapping, how do you see the training of physicians changing in the future?

DR. BOTSTEIN: The university is one of the most conservative institutions that humans have devised. Whatever changes will occur will occur very slowly. In this case, I think that is probably a good thing. Seriously, however, change I would like to see, but don't expect any time soon, is much more attention being paid to the ability of doctors to handle mathematical and statistical issues.

QUESTION: You recently attended a Congressional caucus on bioinformatics. What was your reception like? Do you think Congress fully understands the implications of your mapping breakthrough?

DR. BOTSTEIN: I had the privilege to speak to the caucus about some of the issues we are addressing here today. I gave examples, many from our own work, of the challenges posed by very large data sets. Six Congressmen and a hundred aides that were present seemed very interested and asked a number of good questions. But one must recall that the word "caucus" means that I was preaching to the choir.

QUESTION: I am a postdoc of 7 years, with strengths in biology and array applications, but a weakness in writing code. Am I much less employable than if I had the strength in writing code?

DR. BOTSTEIN: I think so. What I tell the people here at Stanford is that your mental idea should be to be able to set up whatever experimental and analytical systems that you plan to use, from scratch, yourself. That means being able to boot up your own computer, being able to bring it back up if it crashes, installing your own software, and, crucially, being able to write the scripts of real programs you need to make your own work go forward. I, myself, don't do these things, but I did the equivalent that was possible when I was your age. That experience is important not only to my sense of what is the right thing to so but also to my credibility with others.

QUESTION: What does the future hold for the bioinformatics field once most of the known genomes are sequenced?

DR. BOTSTEIN: The crucial premise of this question is that there is a small number of known genomes! I don't think so. Here at Stanford we are looking forward to having sequences from a couple of dozen fungi in the next few years. And we are only one small corner! It is important in this context to realize that multiple sequence comparisons, to take the most straightforward example, scale miserably with the number of things to be compared. There is a lot of serious mathematical and computational work to be done just to appreciate diversity in a comprehensive way. At a different level, the trend here at Stanford is for space to be taken out of use for wet labs and be put into screens, servers, and computational devices because the experimentalists, through paralellization, are becoming so productive. Where once one spent a year doing an experiment and a week thinking about the results, we are coming to a point where the ratio is just the other way around.

QUESTION: Which do you think is more lucrative, conceptualizing new research or the software for new research?

DR. BOTSTEIN: If I knew the answer to this one, I'd be really well off! In my own career, the big steps happened when I stumbled into ways to answer old, well known, but still unsolved problems. The most prominent of these, of course, was mapping disease genes in the human genome. Only the approach was new.

QUESTION: What about proteomics? Is this the next "big thing"?

DR. BOTSTEIN: When I was a young man everything trendy had to end with the suffix "on" as in codon or muton or proton. Today everything trendy ends in "omics" as in protemoics. Seriously, however, it is proteins that do the work in the cell. Ultimately we want to know about the proteins. However, whereas nucleic acid biochemistry lends itself to highly parallel methods, proteins are much less tractable. The onset of affordable, high-throughput mass spectrometry is promising. But, I think, still has a way to go.

QUESTION: My sense is that most scientists in bioinformatics are men. Are there any programs to encourage women in the field?

DR. BOTSTEIN: Well, actually in our group, we have a majority of women. So I am suspicious of the premise. At this point, anybody who shows up with any talent is welcome. The field is dying for talent. Like all of biology, the ratio of genders among students is already about 50-50. Of course, all the usual programs that seek to help women specifically apply to this field as well.

QUESTION: What are the most interesting scientific questions in bioinformatics right now?

DR. BOTSTEIN: I think this is really a matter of taste. It also harks back to the question of what is bioinformatics. My personal interests derive from what I did before. I think it is useful to remind everyone that the usefulness of all imaging methods ranging from CAT scans to magnetic resonance to atomic force microscopy have been limited not by the physics, which, in this case, has been known for many decades, but by the availability of affordable computational cycles. And progress has conformed almost perfectly with Moore's Law.

QUESTION: Are evolutionary studies of the genomes now being sequenced a potentially well-funded subfield? Or will studies directly applicable to human disease get all the funding?

DR. BOTSTEIN: The Congress provides funding almost entirely in the hope of impacts on human health. Nothing we have done changes the motivation, which, I think, is a faithful reflection of the people who pay the taxes. That said, the realization that all the organisms on the planet use essentially the same proteins to do the same operations has resulted in what I like to call a Grand Unification of Biology. Part of this lumping trend is the insight that evolutionary studies are one of the best ways to infer the function of genes and proteins. This applies with particular force to genes about which the only thing we know is that they contribute to disease. By this argument there has been a many fold increase in funding for evolutionary studies at the protein level.

QUESTION: Do you recommend a bioinformatics scientist to work in a biotechnology start-up company as his first job after academic training? And can you recommend the best way to decide the company has potential?

DR. BOTSTEIN: Here is another question that would greatly improve my personal situation if I knew the answer! I have no way to predict which companies are going to succeed in the market. However, you should be able, if you have suitable academic training, to perceive whether the people that want to hire you are up to standard. If they are, then working with them is probably going to be a good experience. Even if the company doesn't make you rich.

QUESTION: What is the most important idea you'd like us to take away from this chat?

DR. BOTSTEIN: The most important idea is that this is an open field, laden with opportunity, for which the only requirements are that you will be willing to take seriously both computation and mathematics on one hand and biology on the other. For the former you need real training and experience and for the latter you need real training and experience.