Friday, March 26, 2010

Modeling the Linnaean taxonomy in OWL: Where do specimens come in?

In my earlier post, I had written about one way of modeling a Linnaean taxonomy in OWL, where the names of species, genera, class, and order could be modeled as instances of concepts from the taxonomic rank ontology. For example, Homo sapiens is modeled as an instance of the Species concept from the Taxonomy Rank ontology. This approach however, did not consider actual biological specimens and their relationships with these taxa.

Last week, I was at the Phenoscape project all-hands meeting at the Field Museum in Chicago. Matt Yoder, co-PI on the Hymenoptera Anatomy Ontology project, spoke about the HAO's design of modeling the relationship between specimens and phenotypes, and modeling the names of the taxa as standalone concepts. This approach addresses two issues at once. One, the recurring problems with synonymy, homonymy, and polysemy are addressed directly instead of relegating them to "annotation property" status. Two, the relationships between specimens, names, and taxonomic ranks can be represented without taking recourse to meta-concepts. For the record, regular concepts are instances of meta-concepts. In my earlier post where specimens were ignored, taxonomic ranks (e.g. Species, Genus, Rank etc.) would be meta-concepts, specific names of taxa (e.g. Brassica olaracea capitata, Canis lupus etc.) would be concepts or instances of the meta-concepts, and finally, specimens such as the big bad wolf and the head of cabbage I bought last evening at the grocery, would be instances of these concepts. There could be workarounds for this philosophy, the most obvious one being modeling the relationship between specimens and taxa names as something other than type-token relationships.

Further, Chris Mungall was also at the project meeting, as a consultant. Chris is currently working on creating representations of homologies and he suggested using the "hasPart" relation from OBO relations to model the relationship between specimens and phenotypes.

Given the definition of a Phenotype concept, the relationship "hasPart" can be extended in an OWL framework to relate a Specimen concept (the domain) to a Phenotype concept (the range). An example RDF triple relating a specimen to a phenotype would be as shown in (1). Note the post composed representation of the Phenotype instance.

'Specimen 1' 'has part' 'some(vertebra 1 and hasQuality some sigmoid)' --(1)

The relationship between a specimen and its taxon name would be represented as shown in (2). I have used "hasTaxonName" for want of a better label for this relation, which relates a Specimen concept to a Name concept.

'Specimen 1' 'has taxon name' 'Danio rerio' --(2)

Lastly, given a "hasRank" relation to model the relationship between a name and a taxonomic rank, the RDF triple (3) completes this paradigm. Note "hasRank" is used as an annotation property in its current avatar.

'Danio rerio' 'has rank' 'Species' --(3)

The following 'type' triples are necessary.

'Danio rerio' 'type' 'Name'
'Species' 'type' 'Taxonomic rank'
'Specimen 1' 'type' 'Specimen'

Alternative names for Danio rerio such as Brachydanio rerio can be represented using the RDF triple in (4), where synonym can be defined as a reflexive property between Name concepts.

'Brachydanio rerio' 'synonym' 'Danio rerio' --(4)

In the interest of sound ontology design principles, each of these concepts can be extended from concepts from "higher-level" ontologies such as the Information Artifact Ontology.

In my next post, I shall look at use cases that can leverage these designs both from the point of view of Phenoscape (the project I currently work on) as well as other life science data integration and modeling projects.

As an aside, my days on the Phenoscape project are numbered and I'm currently looking for new positions. Wish me luck!

Friday, February 26, 2010

Modeling the Linnaean taxonomy in OWL

Following up on the Phenoscape beta release in July, I've worked primarily on warehousing the phenotype data and refactoring the data services for faster performance on the Phenoscape web interface. I'm also collaborating with Chris Mungall at Lawrence Berkeley National Laboratories on a manuscript outlining the principles of OBD and its application to the Phenoscape knowledgebase. I hope to finish writing the first draft in the next couple of weeks.

I've been pondering over ways to create representations of phenotype annotations in RDF triples using OWL concepts, instances, and object properties. A phenotype annotation is a Subject-Predicate-Object triple that relates an evolutionary taxon from a Linnaean taxonomy to an exhibited phenotype. In the Phenoscape project, phenotype annotations relate species (and sometimes higher taxa from the Linnaean taxonomy) of fish to exhibited phenotypes.

To relate these two entities, we have defined a new binary relation exhibits. The exhibits relation has been defined in an OBO framework, where only a simple ID and label are required with a text description of the intended semantics. I have been thinking about a more formal treatment for this important relation, specifically in a Semantic Web framework. How do I create an object property definition of the exhibits relation? What concepts do I define as its domain and range?

In layman terms, the exhibits relation relates a taxon (node) from a Linnaean taxonomy to a phenotype. The taxonomy rank ontology specifies partonomy relationships between the various ranks of a Linnaean taxonomy, each instance of a rank is also an instance of the higher ranks. The taxon concept in the taxonomy rank ontology has been defined as the subconcept of the continuant concept of the Basic Formal Ontology.

Genus, species, family, order, and class are subconcepts of taxon. Species such as Ictalurus furcatus, Oryza sativa and Esox americanus are instances of the species concept. The corresponding genera Ictalurus, Oryza and Esox are instances of the genus concept.

I have not addressed the relationship between actual living organisms and Linnaean taxa; is my dog an instance of Canis familiaris for example, or is this a different kind of relationship altogether? How about fossils that are being discovered in the various corners of the Earth even today such as the fascinating Tiktaalik rosaea? How about the preserved soft tissue specimens in various life science museums? Are these instances of specific taxa? This is the subject of a very old debate in the community of evolutionary biologists and systematists. Very often, evolutionary biologists cannot decide which part of the Tree of Life to assign a newly discovered specimen to. I shall defer a discussion on this relationship to a later post.

Now let us consider phenotypes. A phenotype is defined as an observable physical or biochemical characteristic of a living organism, that is caused by its genetic makeup and also by the influence of its environment. For sometime now, model organism databases have used the Entity-Quality formalism for modeling phenotypes i.e. a phenotype is a quality that inheres in an anatomical or a behavioral entity. Phenoscape subscribes to this formalism. A phenotype concept in Phenoscape (and in OBD from whence it is inherited) is "post composed" from previously defined concepts in an anatomical ontology or a behavioral ontology such as the Foundational Model of Anatomy (FMA) or the GO biiological process ontology and from a quality ontology such as the Phenotypes and Traits Ontology (PATO). This is a nifty way to create a RDF-style blank node with a Skolemized identifier, which identifies the origins of the node. The post composed phenotype is related to the quality concept by a subsumption relationship ("a round fin is round after all") and to the corresponding anatomy or behaviour concept by the inheres_in relation from OBO. Again, the comparison with RDF blank nodes is obvious. It's not the node itself, but its relationships that we care about.

So here goes putting it all together. I use Phenoscape as the namespace prefix here. I have eliminated the angle brackets from the tags so it can be displayed here. This is going into an ontology that will soon be posted on the Phenoscape site.

<owl:Class rdf:ID="Phenotype">
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:ID="PATO:0000001"> // Quality
<owl:onProperty rdf:resource="OBO_REL:inheres_in">
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:ID="GO:0007610"> // Behavior
<owl:Class rdf:ID="TAO:0100000"> // Anatomical entity from TAO
<owl:onProperty rdf:resource="OBO_REL:towards">
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:resource="GO:0007610"> // Behavior
<owl:Class rdf:resource="TAO:0100000"> //Anatomical entity from TAO

Note how I use the root concept of the Teleost Anatomy Ontology as one of the concepts in the OWL union in the range of both the inheres_in property as well as the towards property. This is for the purposes of the Phenoscape project. For other subsets of the Tree of Life, concepts from equivalent anatomy ontologies such as the Amphibian Anatomy Ontology, the Foundational Model of Anatomy (FMA), or even the Common Anatomy Reference Ontology (CARO) can be used instead of this concept.

Now for the taxon concept. This is much simpler. I use the Continuant concept from BFO as the superconcept of taxon. I use TRO as the prefix for the Taxonomy Rank Ontology.

<owl:Class rdf:ID="TRO:Taxon">
<rdfs:subClassOf rdf:resource="BFO:Continuant"/>

Other concepts in the TRO can be defined as below in OWL.

<owl:Class rdf:ID="TRO:Genus">
<rdfs:subClassOf rdf:resource="TRO:Taxon"/>

<owl:Class rdf:ID="TRO:Species">
<rdfs:subClassOf rdf:resource="TRO:Taxon"/>

Lastly, the individual species, genera et al can be defined as OWL individuals as below. These are taken from Peter Midford's Teleost Taxonomy Ontology.

<TRO:species id="TTO:1001979"/> // Danio rerio

<TRO:genus id="TTO:101040"/> // Danio

Similarly phenotypes with post composed identifiers can be defined as instances of the OWL concept phenotype defined earlier

<phenoscape:phenotype rdf:ID="PATO:0000599^OBO_REL:inheres_in(TAO:0000656)"/>

Finally, we define the exhibits relation in OWL.

<owl:ObjectProperty id="exhibits">
<rdfs:domain resource="TRO:Taxon"/>
<rdfs:range resource="#Phenotype"/>

This definition is now the logical underpinning for RDF triples in N3 syntax that look like:

<tto:0001979> <phenoscape:exhibits> <pato:0000599^obo_rel:inheres_in(tao:0000656)>

I may be off on some of the syntax (I'm a bit rusty), but I hope the points I have made in this post have been reflected adequately in these definitions. As always, feedback and critique are welcome. This OWL ontology will soon be up on the Phenoscape site as I have mentioned earlier. I thank Peter Midford for his input and thoughts. In my next post, I will address the relationship between specimens and evolutionary taxa, a subject to which I have briefly alluded here. Until then, happy trails!