Tuesday, October 2, 2007

So you think you know ontologies?

The need to share biological data has led to the development of several high profile and oft referenced ontologies in the life sciences domain. Soldatova and King have pointed out the limitations with many biological ontologies that threaten to hurt the long term purpose of their use in the life sciences domain. Based on my experience in ontology development over the last five years, and my interactions with other ontology developers and organizations looking to invest (or involved) in ontology development, the findings of this paper do not come as a surprise.

Today, the word "ontology" is used in a variety of contexts. Very often, it is used to refer to vocabularies and taxonomies. While an ontology can be both a vocabulary and a taxonomy, the converse is not true. Throwing together a subsumption hierarchy is not ontology development. Nor is the curation of a carefully controlled vocabulary of concepts pertinent to a knowledge domain. Many biologists (and other professionals) dabbling in ontology development are blithely unaware of the mathematical underpinnings of ontologies.

Concepts and relations that are defined as part of an ontology need to be grounded in mathematical axioms. Ontology development toolkits such as Protege and Altova isolate ontology developers, or biologists in this case, from this reality. For all their usefulness in enabling the adoption of ontologies, ontology development tools that conveniently generate OWL syntax obscure the reality that every construct in OWL (at least, the decidable species of OWL) has its semantic underpinnings in a rigorous and formal logical framework.

Ontology developers need to understand data; how it is used, accessed, and most importantly, modeled. A familiarity with the philosophy of the Entity Relation (ER) model or with the Object Oriented (OO) philosophy is a necessary prerequisite to ontology development. A second prerequisite is an understanding of mathematical logic, first order logic at the very least.

Ontology development by specialists or domain experts amounts to a wastage of their skills, if not a serious threat to the quality of the developed ontology. While ontology engineers need not be experts in a specific knowledge domain, their skills are relevant to the distillation of expertise from any domain into a representational framework such as OWL. I would not want an expert virologist to develop an ontology pertinent to viruses, any more than I would want a machinist or hangar technician to design and develop a database of airplane spare parts. The hangar technician would be best employed working with spares, not describing them.

On a positive note, these deficiencies are symptomatic of any new technology, particularly in the information technology area. In the middle and late 90s, programmers accustomed to the procedural syntax of languages such as C were slow to adopt and master the object oriented philosophy behind newly introduced languages such as SmallTalk and Java. Ontologies are in the same phase of adoption today. A new fangled technology that promises to change the world as we know it, with its attendant evangelists (such as yours truly) and skeptics. Believe!!