Friday, March 26, 2010

Modeling the Linnaean taxonomy in OWL: Where do specimens come in?

In my earlier post, I had written about one way of modeling a Linnaean taxonomy in OWL, where the names of species, genera, class, and order could be modeled as instances of concepts from the taxonomic rank ontology. For example, Homo sapiens is modeled as an instance of the Species concept from the Taxonomy Rank ontology. This approach however, did not consider actual biological specimens and their relationships with these taxa.

Last week, I was at the Phenoscape project all-hands meeting at the Field Museum in Chicago. Matt Yoder, co-PI on the Hymenoptera Anatomy Ontology project, spoke about the HAO's design of modeling the relationship between specimens and phenotypes, and modeling the names of the taxa as standalone concepts. This approach addresses two issues at once. One, the recurring problems with synonymy, homonymy, and polysemy are addressed directly instead of relegating them to "annotation property" status. Two, the relationships between specimens, names, and taxonomic ranks can be represented without taking recourse to meta-concepts. For the record, regular concepts are instances of meta-concepts. In my earlier post where specimens were ignored, taxonomic ranks (e.g. Species, Genus, Rank etc.) would be meta-concepts, specific names of taxa (e.g. Brassica olaracea capitata, Canis lupus etc.) would be concepts or instances of the meta-concepts, and finally, specimens such as the big bad wolf and the head of cabbage I bought last evening at the grocery, would be instances of these concepts. There could be workarounds for this philosophy, the most obvious one being modeling the relationship between specimens and taxa names as something other than type-token relationships.

Further, Chris Mungall was also at the project meeting, as a consultant. Chris is currently working on creating representations of homologies and he suggested using the "hasPart" relation from OBO relations to model the relationship between specimens and phenotypes.

Given the definition of a Phenotype concept, the relationship "hasPart" can be extended in an OWL framework to relate a Specimen concept (the domain) to a Phenotype concept (the range). An example RDF triple relating a specimen to a phenotype would be as shown in (1). Note the post composed representation of the Phenotype instance.


'Specimen 1' 'has part' 'some(vertebra 1 and hasQuality some sigmoid)' --(1)

The relationship between a specimen and its taxon name would be represented as shown in (2). I have used "hasTaxonName" for want of a better label for this relation, which relates a Specimen concept to a Name concept.

'Specimen 1' 'has taxon name' 'Danio rerio' --(2)

Lastly, given a "hasRank" relation to model the relationship between a name and a taxonomic rank, the RDF triple (3) completes this paradigm. Note "hasRank" is used as an annotation property in its current avatar.

'Danio rerio' 'has rank' 'Species' --(3)

The following 'type' triples are necessary.

'Danio rerio' 'type' 'Name'
'Species' 'type' 'Taxonomic rank'
'Specimen 1' 'type' 'Specimen'

Alternative names for Danio rerio such as Brachydanio rerio can be represented using the RDF triple in (4), where synonym can be defined as a reflexive property between Name concepts.

'Brachydanio rerio' 'synonym' 'Danio rerio' --(4)

In the interest of sound ontology design principles, each of these concepts can be extended from concepts from "higher-level" ontologies such as the Information Artifact Ontology.

In my next post, I shall look at use cases that can leverage these designs both from the point of view of Phenoscape (the project I currently work on) as well as other life science data integration and modeling projects.

As an aside, my days on the Phenoscape project are numbered and I'm currently looking for new positions. Wish me luck!

5 comments:

PEM said...

Taxonomic synonymy is somewhat more complex then you indicate here. A taxon under the Zoological Code has a valid name and may have multiple synonyms - names that have been published, but lack the publication priority of the valid name. Thus synonymy is not symmetric, which the OBO treatment of synonyms captures correctly. I'm not sure what a name being (reflexively) a synonym of itself means in a taxonomic context.

Cartik said...

Hi Peter,

Thanks for your thoughts. Given the info on valid names and other names, we can subclass the Name concept to be a valid name or otherwise and define a non-reflexive synoym relation between valid names and other names. Thoughts?

- Cartik

Chris said...

Hi Cartik

Can you tell me what a "specimen concept" is?

I would have thought the natural way to represent a specimen would be an instance that instantiated a TTO (or similar ontology) class. This is provided you have the ontological commitment that TTO represents individual organisms.

OK, perhaps some subtleties. Perhaps the specimen is derived_from an instance of the TTO class (or TTO could be defined sufficiently generously to include derived entities but I don't think this is so good).

I'm actually not that sure what constitutes a specimen. Fossils? Pieces of bone? Footprints?

You also have:

'Specimen 1' 'has taxon name' 'Danio rerio' --(2)

I'm not clear on the rationale here. Why do we need a 'has taxon name' relation? What is wrong with relating the specimen to the taxon class, and using the annotation properties in TTO?

Chris said...

I recommend using Manchester Syntax as a notation. You state that the following axiom is an RDF triple:

'Specimen 1' 'has part' 'some(vertebra 1 and hasQuality some sigmoid)'

but it's not in a standard RDF syntax, and the OWL would expand to multiple triples.

It's not clear to me if this is treating specimen1 as an individual or class.

OWL MS makes the semantics absolutely clear; individual:

Individual: Specimen1
Types: has part some (vertebra 1 and hasQuality some sigmoid)

or:

Class: Specimen1
SubClassOf: has part some (vertebra 1 and hasQuality some sigmoid)

Alternatively you could write everything in a readable RDF syntax like turtle. But IMHO this obscures the OWL.

Cartik said...

Hi Chris,

In this post, I tried to address the relationship between specimens and evolutionary taxa that they belong to. This is not addressed in Phenoscape as of now, at least not in a formal manner IMHO. I have talked about the relationships between taxa and phenotypes in my last post and put up some OWL/XML syntax for how the "exhibits" relation can be defined in an OWL framework.

In this post, I tried to use a rather bastardized syntax to highlight the points I was trying to make. This syntax was neither this nor that. I apologize for my indolence. I'm currently writing everything down in N3 syntax and I'll put these up in my next post coming soon. As always, thanks for your feedback.

- Cartik