Just 20 years ago, there was no such thing as "bioinformatics." Margret Dayhoff´s famous protein sequence collection was available only as an atlas in book form, and any study of its content meant long hours of letter counting and dot plotting, even Needleman-Wunsch aligning, on numerous paper sheets.

I remember one incident in the early '80s: Professor Grantham of Claude Bernard University in Lyon, France, published a paper in which he announced a collection of mRNA sequences, from which he had extracted statistical data on nucleotide and codon usage. On the occasion of a visit to Berlin he generously presented us with a floppy disk (literally floppy in those days) with his sequences. And we fed this into the eight-bit PC we had at the time, fascinated by this access to the "code of life." Any new sequence had to be typed by hand from the printed pages into the computer, several times in order to avoid human errors by a round of mutual comparison. Later on, the sequence collection of the European Molecular Biology Laboratory became available, which still exists today in Hinxton near Cambridge, U.K. Data were then distributed by disk parcels.

A crucial event in the mid-'80s was, at least in my perception, Russell Doolittle's 1983 Science paper ( 221, 275-277) on the homology of platelet-derived growth factor and the simian sarcoma virus onc gene, on the basis of sequence comparison alone. Then, in the '90s, came the often depicted exponential increase, first by gene segments, then by mRNA sequences, then by whole genome sequences. Then back to segments: mass accumulation of expressed sequence tags, themselves generated from computer-processed automatic readouts.

This is the sequence story. At the same time, the protein structure coordinates were assembled in the Brookhaven database. Again that exponentially increasing efficiency over the time.

Together with the avalanche of primary information we witnessed the development of the annotation and genome statistics business. Software to click all that information together via the Internet. To join it with MEDLINE and related information services.

And then, suddenly, the notion of "bioinformatics" emerged. I don't even know who invented it, who pushed it through. In a sense, it is even a misnomer, as its realm is genomic information and biomolecular structure.

The latest development, with the advent of functional genomics, is the revival of old disciplines for the sake of making sense out of the mass of molecular information: physiological data, biochemistry of cell metabolism, evolutionary taxonomy, and good old developmental biology.

Today we may state that bioinformatics, with its subdisciplines, has achieved a strategic position in modern biomedical research. This is due to two factors: 1) The sheer mass of molecular "text" and "structure" information can be handled only with the help of high-performance computers with pertinent software outfits--i.e., with the help of bioinformatics, and 2) In the immense data flood, there is an urgent pressure for integration of apparently unrelated genomic information. This requires, again, the computer as an organizing tool, but also new methods for cross-connecting knowledge between the numerous islands of subdisciplinary and specialized knowledge. In other words, bioinformatics is expanding into the foundation of theoretical biology, at least of its molecular and cell biological branch.

Germany was, in terms of bioinformatics, Sleeping Beauty's castle for a long time, despised as "nitpicking" by many. The human genome project was perceived as mere cataloging, without the prospect of fundamental knowledge. This has turned out to be too narrow a view. Today, it is obvious that biocomputing and database processing are methods that contribute fundamentally to our knowledge of cellular biology and its evolutionary theory.

There is an enormous demand for combined biology-informatics experts in both academic as well as corporate biomedicine. This is flattering, and an excellent chance for young students. But it is also a nuisance: It proved extremely difficult to recruit personnel for the often crucial bioinformatic link between fundamental endeavor and applied research.

How long will this bioinformatics boom last? I cannot predict. I used to be skeptical for some time, and thought that tools would soon reach satisfactory performance and the mere accumulation of data prove to be a redundancy generator (e.g., only 1000 protein superfamilies facing an infinite growth of ever-the-same sequence flood)--but this was certainly mistaken. Sequence analysis is at present opening a second principal dimension beyond the databases of species-specific prototype DNA sequences ("the" human genome sequence vs. "the" mouse genome sequence, etc.): In addition to what all creatures of a species have in common, we tackle the problem of what is substantial in the genetic diversity among the members of a population.

My vision for the next 5 or 10 years encompasses not only an increasing flood of new data to be processed on computers of ever-increasing speed and storage space. No, I think we will return with all this information into the domain of classical and modern biology: physiology, biochemistry, genetics, evolutionary theory. Bioinformatics will become part of theoretical biology in a very broad sense. So far theoretical biology cannot compete with, say, theoretical physics, in the ability to provide a conceptual framework for its respective discipline. The doctrine of modern physics is written in the language of its theoretical foundation. Theoretical biology, with biomathematics and biostatistics, for that matter, was at best an ancillary discipline, an illustrative topic. The recent development of genomic and genetic biology has transformed it into a conceptual basis of biology. Like theoretical physics, modern theoretical biology explains and integrates, associates and generalizes, but empirical work, observation of or experiment on nature, remains the ultimate source of knowledge.

There is plenty of work to be done in this borderline area between biomedicine and informatics. Many intriguing puzzles wait to be solved by people's ingenuity.


Bioinformatics Research Group at Max Delbrück Center Berlin-Buch