Most biomedical research laboratories make up their own private language to describe their particular techniques, materials, and measurements. Even medical practitioners have more than a hundred ways to describe a simple fact such as a patient's blood glucose. Talking to other scientists about data is like having a conversation without agreeing which words to use, or what they mean. With no lingua franca, how can biomedical researchers make the most of the vast amounts of data out there?
Over the past decade, a bioinformatics specialization called biomedical ontology has grown up around this question. Most biomedical researchers are familiar with standard terminologies, such as Medical Subject Headings. These are sometimes called ontologies, but true ontologies are more than just controlled terms. They capture, in a logical, systematic way, what scientists regard as the basic truths about a topic. Like equations in physics or axioms in mathematics, they can even be the basis for computational models. When connected to databases, scientific papers, and software applications, ontologies "help cope with the ever-growing, chaotic accumulation of text and facts" in biomedical and translational research. They do this by making data sharing, retrieval, and validation easier, says Stefan Schulz, a professor at the Medical University of Graz in Austria.
The field is "growing hugely," says Barry Smith, director of the National Center for Ontological Research at the University at Buffalo in New York. Biomedical software and device companies are starting to use the strengths of ontologies to improve their products. A volunteer-based collaboration is engineering a suite of standard ontologies intended to help the biomedical research community share data. Academics specializing in biomedical ontology are pushing the boundaries of what ontologies can enable, mining ever more massive data sets in artificially intelligent ways. "It's getting to be impossible to do work in bioinformatics without knowledge of biomedical ontology," writes Mark Musen, head of the National Center for Biomedical Ontology (NCBO) and the Stanford Center for Biomedical Informatics Research in California (SCBIR), by e-mail.
This week, Science, Science Careers, Science Translational Medicine, and Science Signaling have joined forces to take a broad look at the challenges and opportunities researchers face in dealing with data.
This article is one of three in Science Careers on the topic. See also:
See the entire list of articles in all the Science publications at www.sciencemag.org/special/data/.
A new bioinformatics tool
Biomedical ontologies are applications for the field of mathematics-based philosophy called ontology. So, says Jobst Landgrebe, an enterprise architect at the Swiss International Institute for the Safety of Medicines (ii4sm), a biomedical ontology developer needs "three types of thinking: mathematical thinking, philosophical thinking, and domain-specific thinking." Landgrebe started learning ontology design 2 years ago when he was designing drug-safety software. His software needed to list drugs based on their indications and give warnings for contraindications, allergies, side effects, and drug-drug interactions. There was no good drug ontology available so he designed one -- the medicinal product ontology -- from scratch. "Things like aspirin and paracetamol are … described there in great detail," he says.
Landgrebe, who is based in Cologne, Germany, studied philosophy and mathematics as an undergraduate. After that, he did an M.D.-Ph.D. in medicine and biochemistry at the University of Göttingen, finishing in 1998. After working in academia for 8 years, he moved to the private sector and ultimately to ii4sm. When he realized that his software required a new ontology, he read the literature and started a correspondence with Smith, a philosopher. Before long, Landgrebe was building ontologies.
Ontology design has been important to his work, Landgrebe says, but it's just one of the many bioinformatics skills he needs, which is why he does not foresee a strong job market in industry for narrow ontology specialists. Albert Goldfain, a half-time postdoc under Smith who spends the other half of his time as a researcher at the medical device company Blue Highway in Syracuse, New York, agrees. At Blue Highway, Goldfain is building ontology-based models that will help monitoring devices interpret patients' vital signs. "A well placed ontology in a data-intensive application fills an important niche and need," he writes in an e-mail interview with Science Careers.
While some ontologies, like Landgrebe's, serve a specific purpose in a particular application, others are meant as standards for whole scientific communities. "For many decades, experimental data in various domains for different organisms have been recorded and stored in various formats following different standards if any at all," Larisa Soldatova, a Research Councils UK academic fellow who works in the computer science department at the University of Wales, Aberystwyth, writes by e-mail. The resulting chaos slows down translational research in particular, which involves comparing data gathered using different organisms and different techniques and sharing large amounts of clinical data. If all researchers agreed to use the same ontologies to document their work, it would "ensure clarity of the results" and reproducibility, and "enhance knowledge sharing and reusability," Soldatova writes.
One attempt to create standards for the biomedical and translational research communities is the OBO (Open Biological and Biomedical Ontologies) Foundry project. OBO Foundry ontologies are public domain and built by volunteers, often scientists who aren't ontologists. Because few people are both philosophers and domain experts, "people with these different roles work together" to design ontologies, says Schulz, who works informally with the OBO Foundry group.
Building ontologies in this way can be slow because it requires consensus, says Susanna-Assunta Sansone, a team leader at the University of Oxford's Oxford e-Research Centre. Since 2004, she has worked with an international consortium to design the OBO Foundry–affiliated Ontology for Biomedical Investigations. OBI is an ontology that documents elements of clinical and life-science experiments such as sample characteristics and instrument parameters. "You have no idea how much time we have spent on fighting over the meaning of a single word," she says. As an example, she notes that the OBI team had a hard time agreeing on the word "investigations" in the organization's name. "Experiment," "project," and "study" were other candidates.
Italian-born Sansone earned a Ph.D. in molecular biology at Imperial College London in 2000 and went on to do vaccine research at a private company. "While working on the genetic characterization of a vaccine strain, I started using bioinformatics tools," she writes by e-mail. "That was the turning point." Within a year, she was working in data management, which she calls "the less sexy side" of bioinformatics.
Creating standard ontologies does not ensure their use. That's why Sansone also develops software tools designed to make adopting publicly available ontologies as easy as driving a car. "Unfortunately, we are currently in the stage of the Model T Ford ontology, where the engine breaks down every few minutes, and so anyone who wants to drive an ontology needs also to understand what ontology technology brings and how it is supposed to do this," Smith, who is a co-founder of the OBO Foundry, writes by e-mail.
Sansone also plays an outreach role. She is an industry liaison for the OBO Foundry and co-chairs the Bio-Ontology Special Interest Group of the International Society for Computational Biology. As cofounder of the Web site BioSharing, she is collecting a library of ontologies and other standards and implementing an online forum for journals and funders, such as the U.S. National Institutes of Health, which are starting to include standard ontologies in their data-sharing policies. They "will enable the uptake of the ontologies," she says, "because they have the stick."
Making people change the words they use to describe their research is "like being told that you have to speak Russian for the rest of your life," Smith says. "It's an effort of persuasion."
Showing what's possible
If the OBO Foundry and others do their job right, "scientists will use ontologies without having to dedicate much thought to where the ontologies have come from or how they were constructed," NCBO's Musen writes by e-mail.
When that happens, scientists will have the ability to conduct research in whole new ways, says Michel Dumontier, an associate professor of bioinformatics at Carleton University in Ottawa, Canada. Inclined to ponder arguments such as whether unicorns, the Higgs boson, or structures of hypothetical molecules are "real" enough to belong in ontologies, Dumontier focuses his research on figuring out how to make the best possible ontologies. "I'm prepared to test multiple different approaches and try to get a sense of what is the better way of doing it," he says. Many of his approaches explore the potential for automated reasoning, a type of artificial intelligence. In one experiment, he created part of an ontology with machine-understandable descriptions and then let the computer fill in the rest, potentially showing the way to a less labor-intensive method of building ontologies. In another, he designed a program that could identify logical errors in OBO Foundry ontologies.
Ultimately, Dumontier's goal is to build ontologies that "support better science." Working in a collaboration that included Nigam Shah, an assistant professor of medicine at SCBIR, Dumontier showed that it's possible to test scientific hypotheses using formal ontologies with a large database and literature-curated data. That sort of large-scale data mining, especially on the so-called Semantic Web, will become possible when a lot of biomedical data is organized on the "backbone of ontologies" and put online, Shah says. Imagine "a program-crawling-the-Web kind of thing."
Both Dumontier and Shah say that they stumbled into ontology work, neither having learned computer programming before graduate school. In fact, Dumontier didn't start working on ontologies until after he had started as a professor at Carleton. He says he attended a workshop and "saw immediately how incredibly useful it could be, and at that point I stopped doing everything I had planned and I wrote a new research plan."
It wasn't clear then that it was possible to build a successful academic career in ontology. In fact, in 2005, when Musen was recruiting Shah -- who holds a degree in medicine from India as well as a Ph.D. in molecular medicine from Pennsylvania State University -- for a postdoc position, there were no faculty jobs at all in ontology. At the time, Musen predicted that many universities would soon start looking for ontology experts. Smith has made a similar prediction for medical schools and research hospitals.
About 5 years later, Shah went looking for a tenure-track position and ended up with five faculty job offers. (He decided to stay at Stanford.) "Mark's prediction was right," Shah says. "There were tons of jobs out there."
Training opportunities in biomedical ontology
"There is a serious demand for specialists in Semantic Web and ontologies, but there is still no steady supply of specialists," Soldatova writes. Smith's graduate program in ontology at Buffalo plans to launch a certificate program in biomedical ontology in 2012, and many bioinformatics and computational biology programs are starting to offer courses. But for now, workshops and conferences are the best way to get training in biomedical ontology. Here are some upcoming events, put together for Science Careers by Dumontier:
23–25 February 2011
Fredericton, New Brunswick, Canada
19–20 May 2011
29 May–2 June 2011
5–9 June 2011
co-located with SemTech 2011
San Francisco, California
5–6 June 2011
Tutorial: Using formalized ontologies for verification and integration of biomedical data
Presenters: Michel Dumontier & Robert Hoehndorf
17 July 2011
Co-located with Intelligent Systems in Molecular Biology
15–16 July 2011
University at Buffalo, New York
26–30 July 2011
23–27 October 2011