Posts for the month of March 2011

KUMC HERON is now in beta status and undergoing testing

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (Epic and IDX) into HERON. Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Update Jan 2012 to acknowledge data provider for geographic de-identification

HERON Contents Summary

The repository contains approximately half a billion real observations from the hospital and clinics (KUH, UKP).

Facts/Patients Source Go-Live Snapshot Issues
Demographics 7.2M/1.8M
KUH Billing (O2 via SMS) 1980s* Sep 2010 de-identification*, historical*
UKP Billing 2000 Planned*
Diagnoses (IDC9) 2.2M/190K
KUH/O2/Epic Nov 2007 Sep 2010 various*
UKP Billing 2000 Planned*
Medications 18M/60K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Nursing Observations 390M/240K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Lab Results 25M/120K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Procedures (CPT) 0/0
UKP Billing 2000 Planned*

HERON, KUH EHR Coverage Varies Over Time

Epic (O2) electronic medical record is the main source of our data. It is the EMR for KUH and increasingly the clinics.

  • Epic went live at KUH on November 12, 2007.
    • It started with the pharmacy, medication administration, and flowsheet modules
    • Physician Clinical Documentation was added in May of 2009.
    • Computerized Provider Order Entry (CPOE) began in November of 2010.
    • Some outpatient clinics owned by the hospital (ex: Jayhawk clinics, outpatient cancer center treatment) began using Epic as earlier but the main UKP clinic rollout of Epic started in Fall 2010. It is planned to complete in 2013.
  • Epic also contains Admission, Discharge Transfer and Patient Registration information from the Hospital Billing System (Seimens SMS). As a result, we have basic demographics on patients going back to the 1980s. But, we lack the rich clinical detail available since 2007.

We receive periodic copies of the Epic Clarity database to use as the foundation of our repository. Our current copy is from September 22, 2010, which, for example, pre-dates rollout of CPOE.

Date-Shifting

Our DeIdentificationStrategy includes shifting the dates of each patient record by up to one year. For example, if a patient was born July 1, 1960 their birth date is shifted back in time between 0 and 365 days, and and all subsequent observations are shifted by the same number of days.

Plans to add Outpatient Billing data to HERON

We plan to incorporate outpatient billing data from GE IDX in our next release. (#306) The IDX data goes back to the 2000 and will include CPT procedure codes and ICD9 diagnoses codes.

De-Identification of Patient Health Information

We have taken several steps to de-identify the patient records so they may be deemed non-human subjects research. This includes:

  • shifting dates as above
  • Removing address identifiers
  • Only providing gross geographic indicators (state and radius from KUMC) (#51)
    • with thanks to geonames for data relating zip codes to latitude/longitude

By default you have obfuscated counting access. This means the exact counts returned by queries are perturbed by a random number of patients. Obfuscation access also means you can't take advantage of analysis tools and timelines. We will work with our clinical partners on providing this greater level of access for our research community. (#314)

See also details regarding our DeIdentificationStrategy.

Diagnosis Issues

These come from the Epic PAT_ENC_DX table which is populated by the problem list and from tests which are ordered that require a diagnosis. We currently organize the data by ICD9.

  • There are many opinions on this as a source of information. One perspective is that they may be more clinically relevant but lack the completeness of professionally coded diagnoses.
  • In our next release we plan to extract the full ICD9 augmented with SNOMED concepts from the IMO hierarchies stored within Epic. (#273)

Laboratory Results Issues

We have loaded the top 500 most frequently resulted lab tests

  • These are results that were "finalized" by the lab.
  • These do not include microbiology, blood bank, pathology. Only the general laboratory results.
  • The LOINC coding was not validated by laboratory personnel. We will work towards using the local Epic categorization until the laboratory has time to map their results to national standards.

Medication Issues

  • Medications are the dispensed medications by pharmacy. We believe this data is not accurate and are working towards loading the administered medications from the MAR.
    • For now, you can use medications for exploratory activities but don't take any overall volumes as truth. We believe the medication exposure is underreported.
    • The classification of medications follows the internal Epic categories (based on First Databank).

Nursing Flowsheet Data Issues

  • We have loaded all the nursing flowsheet data with the exception of some dates. (#363)
  • Strings are not visible for privacy preservation but you will at least know that the observation was recorded. (#299)
  • The ontologies for flowsheets is an area of research for us. As a first step we are displaying the internal flowsheet organization within Epic at KUH.
    • Keep in mind that one observation which was documented on one flowsheet may appear on many different flowsheets. The best examples of this are height/weight and vital signs.

Other Data Issues

  • We haven't really loaded procedures and CPTs yet though the ontology hierarchy might lead you to think they are loaded.
  • Searching by age is an interesting topic in i2b2 (#158)
    • For the most part, you are searching based upon the patients current age for events that have occurred in the past.
    • Future releases of i2b2 should allow you to specify search criteria for the age at time of event (more relevant to infant/pediatric queries).

I2B2 Software Platform Issues

For the most part, we are not modifying the i2b2 software. Any defects found may be already shared with the i2b2 development team at Partners.

  • We haven't developed training materials beyond what is provided at the i2b2 home page.
  • We are currently running i2b2 version 1.4. Keep this in mind as you compare current functionality with current and planned i2b2 releases (1.5 and 1.6rc3).
    • We will plan to upgrade to version 1.6 of i2b2 when it is official released by their developers and when we can sequence the upgrade with our production release of HERON. (#165)

Note: we have modified the i2b2 software to display the number of facts at each level of the ontology and in some cases also display the distinct number of patients at each level. (#211)

Address any suggestions or feedback to heron-admin@kumc.edu

Getting ahead of the ball with service monitoring

In the earliest phases of development of our HERON clinical research repository, the only users were us developers and a handful of friendly alpha testers, so it was fine to discover problems as we used the system.

But one of the features included in milestone:EpicBetai2b2 is more proactive monitoring (#150), using the popular open source opsview toolset, built on nagios.

Once it was in place for HERON, I showed it to a guy who supports CRIS?, and he figured out how to get nagios working on Windows servers etc.

CRIS is a long-standing production service. Its user community consists of clinical researchers. When CRIS acts up, they don't see it as an interesting technical puzzle to solve; they just see it as the darned computer getting between them and their research goals again.

The database under the CRIS service acted up over the weekend, but this time, instead of a call from a frustrated researcher on Monday morning, the CRIS support team got an automated notice right when the problem started and had it cleaned up before any users noticed.

It's so much nicer to be ahead of the ball in the customer support game.