Approaches to data integration for health care research: i2b2, SHRINE, SPARQL and OWL
i2b2 is the basis of HERON, our health care research data repository, which stores clinical observations into a fairly traditional datamart. I've done database application development of various kinds for decades, but the scale and operational challenges are new to me. I'm particularly happy that the extract/transform/load (ETL) process for our last release ran for 37 hours, lights-out, loading 450 million clinical observations from various sources, primarily, a copy of Epic's Clarity store from the KU Hospital.
While i2b2 includes a modern Ajax web front-end, it makes no use of web-style linked data, let alone OWL or realist ontology, for medical terminology alignment, which is a big part of the challenge in making HERON an effective platform for research and for data integration beyond the KUMC enterprise.
While I'm taking a break from heads-down development mode for this conference, and while I'm in Boston, I hope to take time to look more closely at developments such as:
- Anzo Connect: Semantic Web ETL in 5 Minutes, a SemTech? conference presentation from the guys at Cambridge Semantics
- Stardog and Pellet 3: Semantics for the Enterprise, a June presentation from the guys at Clark & Parsia, with eye-opening performance numbers for query and reasoning with the size of data we deal with.
Our Oracle database on SAS drives handles simple user queries in a few seconds, but in some cases it takes a minute or two or fails altogether. After evaluating fusion-io's solid state storage, we ordered 4 fairly large units. We're still working through the operational details of setting it up, but we hope to see considerable performance improvements for both ETL and end-user queries.
I'm also interested to catch up on some W3C stuff: RDF/SQL mapping (no RIF?! darn.), and the Semantic Web Health Care and Life Sciences (HCLS) Interest Group. It should be interesting to compare SPARQL1.1 federated query with the corresponding i2b2 approach: SHRINE.