Monday, June 23, 2008

Introducing the Ontology-Based PubMed Annotator

Since my last post, I've pretty much kept my nose to the grindstone and I have something to show for it. The new PubMed annotator on steroids, or ahem...the Ontology-Based Pubmed Annotator or the OBPA for short. The OBPA, like its predecessor, the PubMed Annotator, requires the user, a biologist to annotate biomedical experiments in RDF triple format for storage, subsequent querying, summarizing, and comparison. The difference is the OBPA prompts the user with matching terms from a few preselected ontologies, in auto-complete mode even as she is filling the fields. The user can choose to use terms from the ontologies for her annotation work, or she can use her own terms.

OBPA keeps track of the number of terms the user borrows from each ontology as a measure of the ontology's usefulness (c.f. my previous blog entry). OBPA is definitely more advanced, implementing more features and enhanced security than the PubMed Annotator. The following OWL ontologies are currently being used in the OBPA:

a) The Ontology for Biomedical Investigations (OBI)
b) The MGED Ontology
c) Barry Smith's Basic Formal Ontology
d) Heinrich Herre's General Formal Ontology
e) Barry Smith's Relation Ontology
f) Michel Dumontier's Relation Ontology
g) OWL 1.0 Ontology

The OBPA in its current version cannot handle OBO syntax. I believe ontologies such as the Foundational Model of Anatomy, Reactome, and UniProt will also be relevant to the OBPA. The OBPA however, suffers from a significant roadblock which prevents the incorporation of more ontologies into its scope. Terms (classes and properties) from ontologies are loaded into OBPA at deployment time. Given the slow performance of current versions of OWL-based APIs such as Jena and the OWL API, server-side deployment is a very tortuous process with the server timing out frequently. Also, the terms are not updated periodically. With the current rate of progress on ontologies, OBPA runs the risk of using obsolete terminology from ontologies.

Ben Good's Entity Describer (E.D.), which works with ontologically defined terms, uses the interface provided by Freebase to dynamically extract terms from ontologies such as the Gene Ontology (GO) to prompt the user with a suggestion box complete with a text description about the term, the ontology it is extracted from, and sometimes, even a picture! Future revisions to the OBPA may incorporate this methodology to alleviate the problems with obsolete ontological terms. Another solution may be to create a service that periodically browses a selection of ontologies and presents the extracted terms on an interface accessible to applications such as the OBPA. An application such as the Ontology Lookup Service (OLS) which is also compatible with OWL ontologies may help as well.

On a tangent, Mark Wilkinson suggested a future area of work where one could browse through the nodes of an ontology and extract publications associated with every node. I'm putting it down here because it may be something for me to work on in the future, and also to ensure that you heard it first, from here!! In closing, I would like to thank the hands-on help provided by Ed Kawas on the jQuery part of the application, Luke McCarthy for his insightful tips on various aspects, and Ben Good for being the Dry Lab's own “thinker.”

UPDATE: It has been a while since the server for the Wilkinson lab was changed from to the new server. This is the reason why the link to the Pubmed Annotator Web UI is inactive. The WAR I had on my laptop was lost forever when the laptop was stolen from my house in Vancouver. The code for the Ontology Based Pubmed Annotator is available on the Wilkinson lab's code repository, and I will be moving this to a new project on SourceForge very soon.