Friday, March 26, 2010

Modeling the Linnaean taxonomy in OWL: Where do specimens come in?

In my earlier post, I had written about one way of modeling a Linnaean taxonomy in OWL, where the names of species, genera, class, and order could be modeled as instances of concepts from the taxonomic rank ontology. For example, Homo sapiens is modeled as an instance of the Species concept from the Taxonomy Rank ontology. This approach however, did not consider actual biological specimens and their relationships with these taxa.

Last week, I was at the Phenoscape project all-hands meeting at the Field Museum in Chicago. Matt Yoder, co-PI on the Hymenoptera Anatomy Ontology project, spoke about the HAO's design of modeling the relationship between specimens and phenotypes, and modeling the names of the taxa as standalone concepts. This approach addresses two issues at once. One, the recurring problems with synonymy, homonymy, and polysemy are addressed directly instead of relegating them to "annotation property" status. Two, the relationships between specimens, names, and taxonomic ranks can be represented without taking recourse to meta-concepts. For the record, regular concepts are instances of meta-concepts. In my earlier post where specimens were ignored, taxonomic ranks (e.g. Species, Genus, Rank etc.) would be meta-concepts, specific names of taxa (e.g. Brassica olaracea capitata, Canis lupus etc.) would be concepts or instances of the meta-concepts, and finally, specimens such as the big bad wolf and the head of cabbage I bought last evening at the grocery, would be instances of these concepts. There could be workarounds for this philosophy, the most obvious one being modeling the relationship between specimens and taxa names as something other than type-token relationships.

Further, Chris Mungall was also at the project meeting, as a consultant. Chris is currently working on creating representations of homologies and he suggested using the "hasPart" relation from OBO relations to model the relationship between specimens and phenotypes.

Given the definition of a Phenotype concept, the relationship "hasPart" can be extended in an OWL framework to relate a Specimen concept (the domain) to a Phenotype concept (the range). An example RDF triple relating a specimen to a phenotype would be as shown in (1). Note the post composed representation of the Phenotype instance.

'Specimen 1' 'has part' 'some(vertebra 1 and hasQuality some sigmoid)' --(1)

The relationship between a specimen and its taxon name would be represented as shown in (2). I have used "hasTaxonName" for want of a better label for this relation, which relates a Specimen concept to a Name concept.

'Specimen 1' 'has taxon name' 'Danio rerio' --(2)

Lastly, given a "hasRank" relation to model the relationship between a name and a taxonomic rank, the RDF triple (3) completes this paradigm. Note "hasRank" is used as an annotation property in its current avatar.

'Danio rerio' 'has rank' 'Species' --(3)

The following 'type' triples are necessary.

'Danio rerio' 'type' 'Name'
'Species' 'type' 'Taxonomic rank'
'Specimen 1' 'type' 'Specimen'

Alternative names for Danio rerio such as Brachydanio rerio can be represented using the RDF triple in (4), where synonym can be defined as a reflexive property between Name concepts.

'Brachydanio rerio' 'synonym' 'Danio rerio' --(4)

In the interest of sound ontology design principles, each of these concepts can be extended from concepts from "higher-level" ontologies such as the Information Artifact Ontology.

In my next post, I shall look at use cases that can leverage these designs both from the point of view of Phenoscape (the project I currently work on) as well as other life science data integration and modeling projects.

As an aside, my days on the Phenoscape project are numbered and I'm currently looking for new positions. Wish me luck!