Thursday, June 28, 2007

The battle for LSIDs and the obsession with browsers

I've been actively following the debate over resolvable URLs in the Health Care and Life Sciences (HCLS) on the Semantic Web community. At the workshop in Banff as part of the WWW 2007 conference, some leading thinkers in the community actually questioned the utility of resolvable URLs. Shocking!

Non resolving namespaces are the bane of semantic interoperability. DIE, example.org, DIE!! If definitions of concepts cannot be resolved to specific nodes on the Semantic Web, the best antidote is to discard them all together, IMHO. URIs can be location specific as in URLs or non location specific as in URNs. Again, location is given way too much importance at this juncture. As an ontology developer, I do not really care where a concept is defined, as long as I can access the definition. In other words, I NEED the definition but I could care less whether it came out of a location in Timbuktu or Flin Flon, Manitoba.

The emphasis on location is because of the desire by life scientists to view definitions of concepts on a browser, a reluctance to let go of the browser. Ironic indeed! The utility of the Web and Web browser were not immediately apparent in the early days (circa 1990) to the scientific community. But once the benefits of Web pages became clear, they were embraced and have become the cornerstone of scientific research community. As a HCI researcher from Microsoft said at a keynote address at WWW 2007, browsers are clearly on the way out. Tim Berners-Lee, in his seminal paper about the Semantic Web, describes a network of agents that can be invoked by interfaces (not browsers necessarily) and which can process machine understandable content to make intelligent decisions.

The dependence upon browsers necessitates the need for users to remember (if not bookmark) URLs. In its heyday, the AOL browser only needed users to type in a keyword to locate a Web page. For example, typing in the keyword ``NFL `` would bring up the homepage of the National Football League. On a conventional browser, users were required to remember the protocol (HTTP) as well as the complete URL to access the very same page. The use of URNs may very well follow the same procedure as the AOL keyword. It frees the user from the need to remember or bookmark URLs.

LSIDs (Life Science Identifiers) are URNs that are location independent and resolvable. To the end users, LSIDs are transparent, capable of allowing the access of web services from registries such as BioMoby. LSIDs are capable of handling versions of concept definitions. Because they uniquely identify concepts within an ontology, they can be used to extract specific concept definitions from ontologies without necessarily downloading the entire ontology. They also allow the capture of metadata about the concept definition. Metadata includes the identity of the authority that has defined the concept, the version of the definition, and a timestamp among other things. On the other hand, Ben Good has pointed out some of the crucial limitations of the LSID idea. These are temporary limitations though.

Of late, the HCLS community has been discussing the use of LSIDs and LSID resolvers to address the problem of non standard naming protocols in life science ontologies. The Banff manifesto is an initiative that hopes to address the same issue. These are very promising developments. I look forward to the day when example.org is consigned to the dustbin and lingers on as a joke...Cheers!