Friday, February 26, 2010

Modeling the Linnaean taxonomy in OWL

Following up on the Phenoscape beta release in July, I've worked primarily on warehousing the phenotype data and refactoring the data services for faster performance on the Phenoscape web interface. I'm also collaborating with Chris Mungall at Lawrence Berkeley National Laboratories on a manuscript outlining the principles of OBD and its application to the Phenoscape knowledgebase. I hope to finish writing the first draft in the next couple of weeks.

I've been pondering over ways to create representations of phenotype annotations in RDF triples using OWL concepts, instances, and object properties. A phenotype annotation is a Subject-Predicate-Object triple that relates an evolutionary taxon from a Linnaean taxonomy to an exhibited phenotype. In the Phenoscape project, phenotype annotations relate species (and sometimes higher taxa from the Linnaean taxonomy) of fish to exhibited phenotypes.

To relate these two entities, we have defined a new binary relation exhibits. The exhibits relation has been defined in an OBO framework, where only a simple ID and label are required with a text description of the intended semantics. I have been thinking about a more formal treatment for this important relation, specifically in a Semantic Web framework. How do I create an object property definition of the exhibits relation? What concepts do I define as its domain and range?

In layman terms, the exhibits relation relates a taxon (node) from a Linnaean taxonomy to a phenotype. The taxonomy rank ontology specifies partonomy relationships between the various ranks of a Linnaean taxonomy, each instance of a rank is also an instance of the higher ranks. The taxon concept in the taxonomy rank ontology has been defined as the subconcept of the continuant concept of the Basic Formal Ontology.

Genus, species, family, order, and class are subconcepts of taxon. Species such as Ictalurus furcatus, Oryza sativa and Esox americanus are instances of the species concept. The corresponding genera Ictalurus, Oryza and Esox are instances of the genus concept.

I have not addressed the relationship between actual living organisms and Linnaean taxa; is my dog an instance of Canis familiaris for example, or is this a different kind of relationship altogether? How about fossils that are being discovered in the various corners of the Earth even today such as the fascinating Tiktaalik rosaea? How about the preserved soft tissue specimens in various life science museums? Are these instances of specific taxa? This is the subject of a very old debate in the community of evolutionary biologists and systematists. Very often, evolutionary biologists cannot decide which part of the Tree of Life to assign a newly discovered specimen to. I shall defer a discussion on this relationship to a later post.

Now let us consider phenotypes. A phenotype is defined as an observable physical or biochemical characteristic of a living organism, that is caused by its genetic makeup and also by the influence of its environment. For sometime now, model organism databases have used the Entity-Quality formalism for modeling phenotypes i.e. a phenotype is a quality that inheres in an anatomical or a behavioral entity. Phenoscape subscribes to this formalism. A phenotype concept in Phenoscape (and in OBD from whence it is inherited) is "post composed" from previously defined concepts in an anatomical ontology or a behavioral ontology such as the Foundational Model of Anatomy (FMA) or the GO biiological process ontology and from a quality ontology such as the Phenotypes and Traits Ontology (PATO). This is a nifty way to create a RDF-style blank node with a Skolemized identifier, which identifies the origins of the node. The post composed phenotype is related to the quality concept by a subsumption relationship ("a round fin is round after all") and to the corresponding anatomy or behaviour concept by the inheres_in relation from OBO. Again, the comparison with RDF blank nodes is obvious. It's not the node itself, but its relationships that we care about.

So here goes putting it all together. I use Phenoscape as the namespace prefix here. I have eliminated the angle brackets from the tags so it can be displayed here. This is going into an ontology that will soon be posted on the Phenoscape site.

<owl:Class rdf:ID="Phenotype">
<rdfs:subClassOf>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:ID="PATO:0000001"> // Quality
<owl:Restriction>
<owl:onProperty rdf:resource="OBO_REL:inheres_in">
<owl:hasValue/>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:ID="GO:0007610"> // Behavior
<owl:Class rdf:ID="TAO:0100000"> // Anatomical entity from TAO
</owl:unionOf>
<owl:restriction>
<owl:onProperty rdf:resource="OBO_REL:towards">
<owl:someValuesFrom/>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:resource="GO:0007610"> // Behavior
<owl:Class rdf:resource="TAO:0100000"> //Anatomical entity from TAO
</owl:unionOf>
</owl:onProperty>
</owl:restriction>
</rdfs:subClassOf>
</owl:Class>

Note how I use the root concept of the Teleost Anatomy Ontology as one of the concepts in the OWL union in the range of both the inheres_in property as well as the towards property. This is for the purposes of the Phenoscape project. For other subsets of the Tree of Life, concepts from equivalent anatomy ontologies such as the Amphibian Anatomy Ontology, the Foundational Model of Anatomy (FMA), or even the Common Anatomy Reference Ontology (CARO) can be used instead of this concept.

Now for the taxon concept. This is much simpler. I use the Continuant concept from BFO as the superconcept of taxon. I use TRO as the prefix for the Taxonomy Rank Ontology.

<owl:Class rdf:ID="TRO:Taxon">
<rdfs:subClassOf rdf:resource="BFO:Continuant"/>
</owl:Class>

Other concepts in the TRO can be defined as below in OWL.

<owl:Class rdf:ID="TRO:Genus">
<rdfs:subClassOf rdf:resource="TRO:Taxon"/>
</owl:Class>

<owl:Class rdf:ID="TRO:Species">
<rdfs:subClassOf rdf:resource="TRO:Taxon"/>
</owl:Class>

Lastly, the individual species, genera et al can be defined as OWL individuals as below. These are taken from Peter Midford's Teleost Taxonomy Ontology.

<TRO:species id="TTO:1001979"/> // Danio rerio

<TRO:genus id="TTO:101040"/> // Danio

Similarly phenotypes with post composed identifiers can be defined as instances of the OWL concept phenotype defined earlier

<phenoscape:phenotype rdf:ID="PATO:0000599^OBO_REL:inheres_in(TAO:0000656)"/>

Finally, we define the exhibits relation in OWL.

<owl:ObjectProperty id="exhibits">
<rdfs:domain resource="TRO:Taxon"/>
<rdfs:range resource="#Phenotype"/>
</rdfs:range>

This definition is now the logical underpinning for RDF triples in N3 syntax that look like:

<tto:0001979> <phenoscape:exhibits> <pato:0000599^obo_rel:inheres_in(tao:0000656)>

I may be off on some of the syntax (I'm a bit rusty), but I hope the points I have made in this post have been reflected adequately in these definitions. As always, feedback and critique are welcome. This OWL ontology will soon be up on the Phenoscape site as I have mentioned earlier. I thank Peter Midford for his input and thoughts. In my next post, I will address the relationship between specimens and evolutionary taxa, a subject to which I have briefly alluded here. Until then, happy trails!