If you've used a web ontology before, or any other large-scale data repository, you're likely familiar with one of the chief concerns facing anyone in such a position: how do you get your data into the system? Moreover, how do you get large amounts of data into the system with (relative) ease? And if you've used a content management system before, you've likely faced a similar, albeit inverted problem: how do you get your data out?
If you can accomplish these preliminary items without a good deal of effort, you're finally left with the task of transforming the data from one, and allowing it to be recognizable by the other.
If, instead, you haven't used either of these, you're likely wondering why on Earth you would want to.
Resource Sharing: Web Ontologies and Eagle-I
The Eagle-I ontology is an open source, web-based software project facilitating a growing network of linked open data.
So let's break that down. Eagle-I is fundamentally a network. The Eagle-I Network was founded by a Consortium of nine academic institutions, and has since grown to 26 participating institutions. Melissa Hendel at Oregon Health and Science University led the development of the ontology, while Dr. Daniela Bourges-Waldegg at Harvard University was instrumental in the development of the software. The focus is on the promotion of biomedical resource discovery in the scientific community. So a lab at Harvard may have a resource that a research scientist needs in Seattle, or vice versa. The cost, of course, is much less for both institutions if the resources can be shared, but only if they know where to look.
The idea of a web ontology, however, goes far beyond the reach of the Eagle-I Network. Tim Berners-Lee, the Director of the World Wide Web Consortium (W3C) and inventor of the Internet remarked decades ago:
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize.
If that sounds a little too much like an Aldous Huxley novel to you, remember that we humans build these machines: in the case of Eagle-I the machines – or nodes, one at each participating institution – handle navigating all the data shared between them, we just have to tell them what we're looking for.
Usually we're looking for something defined by the ontology itself – and if you're not an Informational Scientist, we should pause on that word ontology. An ontology is basically a shared vocabulary within a specified domain: we're defining the machine language so that the machines can navigate this data. In the case of Eagle-I, the domain is health care research. Ontology Classes, or more generally just members, include anything from research laboratories and research scientists to cell cultures and antibodies to pipettes and petri dishes. All these classes must be defined by the ontology: a laboratory has a name, an address, a phone number, a lead researcher, etc. and many of these properties may very well be classes themselves – laboratories have scientists, cell cultures and pipettes, moreover scientists have names, phone numbers, etc. Many of these classes that are properties may be from the same ontology, but if names and phone numbers have been defined already in the Semantic Web it makes little sense to define them again here.
Once the ontology defines the classes we are free to create class instances, or more familiarly, content on the network. We can create an instance of the “Core Facility” class to represent the Antibody Development lab at Fred Hutchinson Cancer Research Center, and we can relate it to instances of the “Service” class to represent the services it provides, such as Monoclonal Antibody Development. In this way, we've created a web of Linked Data.
While traversal of this data by machines is simple, getting information about resources in and out of such a structure can be difficult. Knowledge Representation languages such as OWL need to interface with a Resource Description Framework (RDF/XML) to get data in, and something like SPARQL Protocol and RDF Query Language to get data out.
Managing Resources: Drupal and the CMS
While the web ontologies provide us with a linked graph of data, this data is intended to be navigable by machines. If we want to browse through a catalog of all our resources as a web page, we'll need to use a Content Management System, or CMS. A good CMS will let us define the data structure for our content, and will allow us to relate content to other content.
But of the hundreds of Content Management Systems available, what makes Drupal a better choice? Let's look at a few reasons:
Drupal is open-source. In short: Drupal is free. There are no license fees of any kind, under any circumstances. This also means that the code base is entirely open and transparent; anyone can view the core code; they are free to make changes, and even contribute that code back to the community. Unlike proprietary systems, there is no “black-box” whose functionality is known and controlled by only a handful of programmers.
Drupal is modular and extensible. This allows the core code base to be relatively lean, while still allowing for nearly any functionality imaginable. You can set up an e-commerce site, multiple sites at one domain, you can add image carousels, slideshows and galleries, implement a CRM, or even translate all your text to Pirate Speak. And because of Drupal's modularity and it's open-source status, if one of the over 22,000 modules doesn't fit your needs extending the functionality of the CMS is relatively simple.
Drupal has a strong community of developers. As mentioned above, there are over 22,000 modules for Drupal, all contributed by the community members. In addition, these members are also constantly working on making the core code better and more useful with each release. DrupalCon, the twice yearly Drupal convention takes place alternately in the North America and the EU and brings together over 3000 community members and developers.
In short, Drupal is free, easily extended, and managed and maintained by a large community of developers. If you can imagine some functionality, chances are it's already been developed. If it hasn't, it can be.
But don't necessarily take our word for it, consider some of the large-scale sites that are currently build in Drupal: The White House, The US House of Representatives, The US Department of Commerce, The Louvre, The World Economic Forum, ING, Zynga, Digg, McDonald's, Best Buy, PayPaland Twitter's development communities, Christina Aguilera, Chris Rock, and Rafael Nadal, to name just a few. Also, we can now consider The White House a contributing developer.
Bridging the Gap: Drupal, meet Eagle-I
Fred Hutchinson Cancer Research Center in Seattle, Washington had built an award-winning website hosting their Shared Resources, managed by the Arnold Library. Library staff, recognizing the unique potential Drupal provided for managing linked open data, chose it as the framework with which to build the site. Doing so allowed research scientists in the core labs to provide content, guidance and requirements; and it allowed librarians write, edit, photograph and create video and other imagery to promote access to the core labs and provide training to the research community. The library staff were further interested in setting up an Eagle-I node, and migrating all of these resources into the Eagle-I ontology, but creating a new framework was entirely out of the question. Furthermore, this migration had to be seamless, and it had to be ongoing. It had to happen without duplicating of effort, creating extra expense, and introducing inefficiency. Ultimately, an automated workflow between the Shared Resources website and the Eagle-I ontology was the only suitable solution.
We at Freelock, in tandem with Harvard University (the original developers of the Eagle-I ontology) were tasked with making this migration a reality. We all agreed that in the best interest of the community, this work should also be open-source, and should be available to other institutions with the same needs.
Dr. Daniela Bourges-Waldegg and her team at Harvard set about creating a REST API for the Eagle-I ontology, allowing web services to connect to the ontology and migrate data and metadata. Once the API was in place, we were able to create the custom Drupal module that would transform and transfer the data to the ontology.
Using a Model-View-Controller (MVC) software architecture pattern and Object-Oriented (OO) design principles, our module transforms Drupal nodes into a structure meaningful to the Eagle-I ontology, and then passes these as JSON objects to the API. Since a Drupal content type serves as the metadata wrapper for a node, or piece of content, we are mapping the content type to a given Eagle-I ontology class. Further, we're mapping fields on the content type to “text properties” on the ontology class, and node and taxonomy references to “linked resources” on the ontology class. The module currently handles both forward and reverse node references.
We decided to model the Eagle-I ontology data structures in the module for future extensibility; therefore we are simply serializing and de-serializing JSON objects sent to and retrieved from the API. The ability to transfer nodes of a given content type has a granularity to the individual node, and while the ability to check the integrity of class mappings to ontology structures is not mature, we are checking that the ontology version has not been updated, and we are not transferring any nodes until the mapping can be verified against the new version structure.
The mapping itself is accomplished via a custom XML file and accompanying schema to designate which content type is mapped to which ontology class, and further which fields and relations are mapped to which text properties and linked resources. The module will create a new ontology class instance if it has never pushed the object previously (and hence never retrieved the object URI), and will update the class instance in all other cases. The module supports batch processing using the Drupal BatchAPI, and pushing each node whenever the node is saved using the NodeAPI. Finally, the URI to node ID data is exportable, so the module can be ported to other instances of the site without creating duplicate content on Eagle-I.
Ok. But what in the world does all of this mean?
In the end it's relatively simple, really. First, the module needs to be configured to point to an Eagle-I node that has the API installed. Then you can select which content types to map and export to Eagle-I. Exporting can be turned off for individual nodes if there are some you don't want shared. The content types will then need to be mapped manually: Fred Hutchinson, for instance, wanted to map a content type called “Core Facility” to the Eagle-I equivalent “Core Laboratory”. They wanted the field “email” mapped to “email”, and the field “description_short” mapped to “resource_description”. Once the required XML is created from the schema document, the module is ready to use. You can batch export all your nodes, or set it to export them whenever they are saved, or both. If the ontology ever changes, the module will wait for you to update your mapping to the new structure, and if you want to move the module to a new version of the site, you can do so without jeopardizing the integrity of the data you have on Eagle-I.
Moving Forward: Opportunities for the Future
The module has passed all testing and is currently in production on the Fred Hutchinson Cancer Research Center's Shared Resources website, mapping nodes to ontology class instances and passing them to the Eagle-I ontology.
At this point there are two significant areas of opportunity we see moving forward:
At present, all the mapping data must be done in code (.xml). Since this was being developed specifically for Fred Hutchinson Cancer Research Center, the mapping of the data was specific and static, but in order for the module to be generalized and more readily useful to a wider audience we would like to create a user interface that would generate the necessary mapping documents on the back end without the user having to write or otherwise have any knowledge of .xml or .xls.
The module was developed for Drupal 6, whereas the current release of Drupal is 7. We would like to port the module to Drupal 7, and even further down the line we would like to have a D8 version available.