Approaches to data integration for health care research: i2b2, SHRINE, SPARQL and OWL

A couple of us are off to Boston this week for the  i2b2 Academic Users' Group First Annual Conference to present a poster and soak up all kinds of good stuff.

i2b2 is the basis of HERON, our health care research data repository, which stores clinical observations into a fairly traditional  datamart. I've done database application development of various kinds for decades, but the scale and operational challenges are new to me. I'm particularly happy that the extract/transform/load (ETL) process for our last release ran for 37 hours, lights-out, loading 450 million clinical observations from various sources, primarily, a copy of Epic's Clarity store from the KU Hospital.

While i2b2 includes a modern Ajax web front-end, it makes no use of web-style linked data, let alone OWL or realist ontology, for medical terminology alignment, which is a big part of the challenge in making HERON an effective platform for research and for data integration beyond the KUMC enterprise.

While I'm taking a break from heads-down development mode for this conference, and while I'm in Boston, I hope to take time to look more closely at developments such as:

Our Oracle database on SAS drives handles simple user queries in a few seconds, but in some cases it takes a minute or two or fails altogether. After evaluating  fusion-io's solid state storage, we ordered 4 fairly large units. We're still working through the operational details of setting it up, but we hope to see considerable performance improvements for both ETL and end-user queries.

I'm also interested to catch up on some W3C stuff:  RDF/SQL mapping ( no RIF?! darn.), and the  Semantic Web Health Care and Life Sciences (HCLS) Interest Group. It should be interesting to compare  SPARQL1.1 federated query with the corresponding i2b2 approach: SHRINE.

Comments

No comments.