Approaches to data integration for health care research: i2b2, SHRINE, SPARQL and OWL
A couple of us are off to Boston this week for the i2b2 Academic Users' Group First Annual Conference to present a poster and soak up all kinds of good stuff.
i2b2 is the basis of HERON, our health care research data repository, which stores clinical observations into a fairly traditional datamart. I've done database application development of various kinds for decades, but the scale and operational challenges are new to me. I'm particularly happy that the extract/transform/load (ETL) process for our last release ran for 37 hours, lights-out, loading 450 million clinical observations from various sources, primarily, a copy of Epic's Clarity store from the KU Hospital.
While i2b2 includes a modern Ajax web front-end, it makes no use of web-style linked data, let alone OWL or realist ontology, for medical terminology alignment, which is a big part of the challenge in making HERON an effective platform for research and for data integration beyond the KUMC enterprise.
While I'm taking a break from heads-down development mode for this conference, and while I'm in Boston, I hope to take time to look more closely at developments such as:
- Anzo Connect: Semantic Web ETL in 5 Minutes, a SemTech? conference presentation from the guys at Cambridge Semantics
- Stardog and Pellet 3: Semantics for the Enterprise, a June presentation from the guys at Clark & Parsia, with eye-opening performance numbers for query and reasoning with the size of data we deal with.
Our Oracle database on SAS drives handles simple user queries in a few seconds, but in some cases it takes a minute or two or fails altogether. After evaluating fusion-io's solid state storage, we ordered 4 fairly large units. We're still working through the operational details of setting it up, but we hope to see considerable performance improvements for both ETL and end-user queries.
I'm also interested to catch up on some W3C stuff: RDF/SQL mapping (no RIF?! darn.), and the Semantic Web Health Care and Life Sciences (HCLS) Interest Group. It should be interesting to compare SPARQL1.1 federated query with the corresponding i2b2 approach: SHRINE.
Announcing CTSA funding from NIH to KUMC
We're proud of our role in this week's announcement:
KUMC receives $20 million Clinical and Translational Science Award
June 14, 2011
Kansas City, Kan. — Patients will gain faster access to the benefits of health research throughout the region thanks to a grant announced today.
The University of Kansas Medical Center has received a $19,794,046 Clinical and Translational Science Award from the National Institutes of Health (NIH). The five-year grant puts the medical center among an elite, 60-member group of universities collaborating on clinical and translational research, which transforms laboratory discoveries into treatments and cures.
Launched by the NIH in 2006, the Clinical and Translational Science Awards (CTSA) program goals are to speed laboratory discoveries into treatments for patients, to work with communities in clinical research efforts, and to train a new generation of researchers to bring cures and treatments to patients faster. With its new grant, KU Medical Center will create a program called Frontiers, greatly expanding the reach of its existing Heartland Institute for Clinical and Translational Research, which has been the center of clinical and translational research for Kansas and the greater Kansas City region.
Scientists at KU have been doing translational research for years. For example, clinical trials are now being held for an ovarian cancer drug that KU researchers have reformulated so that it can be delivered in a patient's abdomen instead of intravenously, which caused negative side effects. Other scientists have discovered that DHA, the omega-3 fatty acid common in fish oil, may help infants develop better attention skills. In part, as a result of this research, DHA is now added to many infant formulas. Other researchers are studying whether exercise can slow the progression of Alzheimer's disease.
...
In fact, a big part of what's new about translational research at KUMC this year is our very own biomedical informatics division:
Biomedical Informatics accelerates scientific discovery and improves patient care by converting data into actionable information. Pharmacologists and biologists use informatics to understand how drugs and cells interact at a molecular level; scientists use software to determine what kind of patients may most benefit from a clinical trial; doctors view risk models to help individualize therapies for patients.
The specific aims from our section of the grant are:
- Provide a HICTR portal for investigators to access clinical and translation research resources, track usage and outcomes, and provide informatics consultation services.
- Create a platform, HERON (Healthcare Enterprise Repository for Ontological Narration), to integrate clinical and biomedical data for translational research.
- Advance medical innovation by linking biological tissues to clinical phenotype an pharmacokinetic and pharmacodynamic data generated by research in phase I and II clinical trials (address T1 translational research).
- Leverage an active, engaged statewide telemedicine and Health Information Exchange (HIE) to enable community based translational research (Addressing T2 translational research).
Presentation materials from Dr. Waitman's talk from last September, Developing Clinical and Translational Informatics Capabilities for Kansas University go into more detail on those aims.
The focus of our development work for the past year or so has been on the HERON data repository, but starting with milestone:RavenCTSA, the plan is to broaden the portal from just informatics tools for use within KUMC to a variety of tools for investigators in our community.
Want to join the fun? We're hiring.
HERON April update brings revised Lab terminology, performance increase
Since the HERON Feb 2011 snapshot release, the major enhancements are:
No results
HERON Contents Summary
The HERON repository contains approximately 445 million real observations from the hospital and clinics:
Facts | Patients | Source | Go-Live | Snapshot | Issues | |
---|---|---|---|---|---|---|
Demographics | 10.6M | 1.84M | ||||
KUH Billing (O2 via SMS) | 1980s | Apr 2011 | various* | |||
UKP Billing | 2000 | Apr 2011 | #406 | |||
5622 | HICTR participant registry | Jun 2009 | Apr 2011 | |||
Diagnoses (IDC9) | 15.7M | 540K | ||||
KUH/O2/Epic | Nov 2007 | Apr 2011 | various* | |||
UKP Billing | 2000 | Apr 2011 | ||||
Medications | 22.8M | 81.1K | ||||
KUH/O2/Epic | Nov 2007 | Apr 2011 | various* | |||
Nursing Observations | 349M | ? | ||||
KUH/O2/Epic | Nov 2007 | Apr 2011 | various* | |||
Lab Results | 37.6M | 150K | ||||
KUH/O2/Epic | Nov 2007 | Apr 2011 | various* | |||
Procedures (CPT) | 8.62M | 501K | ||||
UKP Billing | 2000 | Apr 2011 | ||||
All | 445M |
Beta Disclaimer
We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.
Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.
We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.
Various Issues Still Apply
Keep in mind the issues noted in the original HERON beta notice, including:
- date shifting, part of our DeIdentificationStrategy
- age searching (#158)
Status of Problems/Bugs
Problems/Bugs addressed in this milestone
No results
Outstanding Problems/Defects/Issues
No results
Completed tasks, Resolved Design Issues
For a more detailed account of the development of this release, see milestone:heron-apr-update.
We have a great opportunity for a Biomedical Informatics Software Engineer!
The division of medical informatics seeks highly motivated individuals with a passion for software development, scientific discovery, and improving healthcare. This position is responsible for developing and maintaining medical informatics applications to support Kansas University Medical Center. This includes developing/interacting with clinical systems (Ex. EPIC, Cerner), data warehouses and analytics, national terminology vocabularies (UMLS, RxNorm, LOINC, FDB), clinical research systems (Ex. VELOS), and external registries and state/national datasets.