Posts by author rwaitman

HERON Elk City updates to i2b2 v1.6.07 and includes Ambulatory Nursing Documentation and Orders

The Elk City HERON release includes an upgrade to the i2b2 software version 1.6.07. Width of the Query Tool and Navigate Terms workspaces can be resized by dragging the vertical bar between the areas. Query within query fixed to provide correct patient numbers. We have also incorporated nursing/multidisciplinary flowsheet data from the ambulatory clinics (an additional 20 million facts) as well as procedure orders (from 54 million to 110 million).  For the flowsheet data, check out the "UKP" folder and the "KU AMB" and "KU GEN CARD" subfolders under "KU" in the ontology for many of the new types of data now exposed by HERON for clinic visits.  This release also includes many fixes behind the scenes not visible through the interface.

Note that this month, the Smart Data elements contained within physician notes were not loaded due to a data inconsistency detected by our loading process. We are investigating the root cause but do not want this fairly new data source to delay the release of the database for this month. See ticket 1516 for details.

HERON Elk City Contents Summary

This month, our tour of rivers and lakes in Kansas honors Elk City Reservoir.

The HERON repository contains approximately 955 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 18.5M 1.94M
KUH Billing (O2 via SMS) 1980s Nov 2012 various*
UKP Billing 2000 Nov 2012
14.3K 14.3K Frontiers participant registry Jun 2009 Nov 2012
186.6K 186.6K Social Security Death Index 1962 Nov 2012
Diagnoses (IDC9) 34.7M 649K
KUH/O2/Epic Nov 2007 Nov 2012 various*
UKP Billing 2000 Nov 2012
University HealthSystem Consortium (UHC) Q4 2008 June 2012
Medications 77.2M 295K
KUH/O2/Epic (Organized by VA Class) Nov 2007 Nov 2012 various*
Nursing Observations 557M ?
KUH/O2/Epic Nov 2007 Nov 2012 various*
Lab Results 81.6M 285K
KUH/O2/Epic 2003 Nov 2012 various*
Procedure Orders 110M 436K
KUH/O2/Epic 2003 (?) Nov 2012 various*
Procedures (CPT) 10.5M 574K
UKP Billing 2000 Nov 2012
Reports/Notes 9.5M 222K
KUH/O2/Epic ? Nov 2012
Specimens 41.9K 3.63K
KUMC Biospecimen Repository ? Nov 2012
Visit Details 14.5M 521K
KUH/O2/Epic Nov 2007 Nov 2012 #1514
Cancer Cases 9.7M 66K #1600
KUH Cancer Registry 1950s Nov 2012 labels*
Hospital Quality Metrics 4.12M 60.9K
University HealthSystem Consortium (UHC) Q4 2008 June 2012
Triple Negative Breast Cancer Registry (BRCA) 17.8K 133
REDCap July 2011 Sept 2012
All 955M


Some material in the UMLS Metathesaurus is from copyrighted sources of the respective copyright holders. Users of the UMLS Metathesaurus are solely responsible for compliance with any copyright, patent or trademark restrictions and are referred to the copyright, patent or trademark notices appearing in the original sources, all of which are hereby incorporated by reference.

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Enhancements and Problems/Defects/Issues Addressed in this Release

multicohort survival analysis plugin
HERON patients seem to age after they die
drag-and-drop entrance criteria, outcome, censor for survival analysis plug-in
Query HERON procedure orders by inpatient/outpatient modifiers
some detailed diagnoses from Epic are hidden under ICD9 leaves
flatten top of UMLS ICD9 Diagnosis hierarchy
Harvest ambulatory nursing flowsheet data
names in HERON "reports" hierarchy are long and redundant
Explosion of diagnosis codes leads to poor performance

Outstanding Problems/Defects/Issues

Want to swim in Big Data? We are seeking summer interns

The Division of Medical Informatics, in the Department of Biostatistics, at the University of Kansas Medical seeks highly motivated individuals with a passion for software development, scientific discovery, and improving healthcare.  We are offering a paid student intern position for students in engineering and science to develop software and systems to support clinical and medical informatics research.  Our environment and topics include:

·         our clinical data warehouse which contains over 600 million facts for over 1.9 million patients

·          integrating statistical analysis and machine learning with data warehousing

·         understanding the impact of electronic medical records on patient care

·         clinical trial management systems and integration of biospecimens to support personalized medicine

·         developing methods for increasing clinical research participation in the community.

You will gain experience working in a software engineering environment using open source technologies.


Desirable skills include:  Experience programming a higher level language (C, C++, Java) and with scripting languages (Python). Proficiency with Microsoft Office products (Word, Excel). Experience developing in a LINUX/UNIX environment and working as part of a team using code management systems (ex: Subversion, Hg). Experience developing relational database driven applications (Oracle, MySQL), database design and SQL.   Experience programming dynamic web pages (such as JSP, PHP, Drupal, or cgi script based). Client side scripting experience with Javascript. Apache web server administration, and LINUX or UNIX system administration. Experience in knowledge discovery in databases (KDD) and data mining is also desirable: especially data preprocessing and transformation and frameworks for analysis. Previous experience in or knowledge of medicine, biology, and healthcare systems.


To learn more about what medical informatics is doing, visit: and our blog


If you are interested in a position, please forward your resume and a cover letter to: rwaitman@…


The Great Blue Herons at Cornell are very prolific

We are very excited as one of our mascots, the Great Blue Heron, is currently in nesting season. Cornell has a terrific web cam

They now have 5 eggs in their nest which is more prolific than usual.

See Cornell Lab of Ornitology's Great Blue Heron Nest Web Cam!

We are very excited as one of our mascots, the Great Blue Heron, is currently in nesting season. Cornell has a terrific web cam

We are recruiting for a Research Assistant Professor in Medical Informatics

Position Summary This Research Assistant Professor Faculty position is in the Division of Medical Informatics, Department of Biostatistics, within the School of Medicine.

The division of medical informatics seeks highly motivated individuals with a passion for software development, scientific discovery, and improving healthcare. Be part of a rapidly growing team developing informatics to further translational research (KUMC just received a NIH Clinical and Translational Science Award beginning June 2011) and serving a dynamic community (Kansas City was selected by Google as the first ultra high-speed fiber connected community).

Responsibilities will include developing informatics infrastructure capabilities, conducting research, and especially collaborative research. The position is expected to engage in collaborative research with other faculty from programs and departments within the School and University. The candidate will also be expected to collaborate with the State of Kansas and affiliate organizations such as the University of Kansas Hospital. The candidate may also be expected to teach courses for graduate students in biostatistics and other disciplines.

Key Roles and Responsibilities:

  • Work as an independent informatician to provide collaborate research support related to the development of informatics solutions for KUMC researchers and affiliates.
  • Possess the ability to design/develop key components of the informatics infrastructure, including terminology, data models, knowledge resources, dynamic end-user interfaces, and aggregations of data designed to support research.
  • Participate in team based software development and system management.
  • Evaluate clinical and research information systems and interventions at KUMC and affiliate organizations.
  • Contribute to the writing of grants to support new projects and the writing of manuscripts to publish findings from ongoing project and system evaluations.
  • Teach informatics courses for graduate students in biostatistics and other disciplines.
  • Perform other duties as may be assigned by the division director or chair.

Required Qualifications:

PhD., M.D., or D.O. and training in biomedical informatics through advanced degree programs, fellowship training, or comparable experiences in knowledge management, clinical decision support, translational informatics and relevant informatics standards. The ability to promote effective teamwork in a rapidly changing multidisciplinary research environment. Superior interpersonal and communications skills as demonstrated by excellence in speaking, writing and listening. Informatics research experience with clinical, public health, and medical administrative systems.

Preferred Qualifications:

The individuals expertise should draw from the broader field of biomedical informatics to complement and expand the department's capabilities. Desirable experiences and domains include: clinical information system and clinical decision support public health informatics quantitative/qualitative informatics evaluation methods statistics, biostatistics and quality management methods Database and application development HL7 data integration/system architecture ontology management (UMLS, LOINC, SNOMED, FDB, RxNORM) data warehousing; the division's HERON clinical repository utilizes i2b2 (over 500 million observations) to store information from our affiliate clinical organizations clinical research informatics; both Velos and REDCap are used for clinical research information systems knowledge discovery and statistical learning methods natural language processing laboratory/pathology information systems especially in support of tissue management and cancer research

Feel free to contact me, rwaitman@… to learn more, visit the rest of this wiki to learn about our work.  Or, just go ahead and apply for position M0203705: Research Assistant Professor in Medical Informatics.

Our Executive Vice Chancellor has an amazing bird.

Another feathered friend of our informatics initiative.

KUMC's Barbara Atkinson, MD, and Buddy the "Rock Chalk"-singing parrot

Social Security Death Index integration expands vital statistics available in HERON

The Department of Biostatistics is excited to provide for you linkage between our clinical and administrative records and the Social Secuity Death Master File released by the National Institute of Standards and Technology.

Previously, we only had record that a patient was deceased based upon follow up at KUH/UKP or if they died while cared for at KUH/UKP (23,850 patients indicated as "Deceased" within the "Demographics" ontology). Now we have an additional indicator that the patient has died based upon records reported to the federal government (177,706 patients indicated as "Deceased per SSA" within the "Demographics" ontology).

Note, we currently are matching patients based upon an exact match of their social security number AND their date of birth.

We think this will be a powerful addition which will allow preliminary hypotheses generation regarding mortality rates between different patient cohorts. In the future we might hope to provide analysis plugins that calculate routine survival analysis.

For a neurlogical example: HERON has 1356 patients who've ever been diagnosed with Amyotrophic lateral sclerosis (ALS aka Lou Gehrig's disease) at KUMC since 2000. Of those, the hospital knew 206 had died. Now that we can also check the social security administration, we know that 821 have died.

Using i2b2 Timeline and other analysis plug-ins in HERON

We recently worked out the technical and regulatory issues to take you beyond counting to detailed analysis of the data in HERON (#314).

Consider this example from my recent Internal Medicine Research Committee Presentation on HERON:

To access these capabilities, choose to check the box to also create a "patient set" when you run your query. Then, use the analysis tools tab at the top which takes you to a series of i2b2 plugins. We have found that the "Demographics (1 Patient Set)", "Demographics (2 Patient Sets)", and "Timeline" plugins to be very illuminating.

Stay tuned for more training materials or join us at the bi-weekly informatics clinic for help.

KUMC HERON is now in beta status and undergoing testing

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (Epic and IDX) into HERON. Please email us at if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Update Jan 2012 to acknowledge data provider for geographic de-identification

HERON Contents Summary

The repository contains approximately half a billion real observations from the hospital and clinics (KUH, UKP).

Facts/Patients Source Go-Live Snapshot Issues
Demographics 7.2M/1.8M
KUH Billing (O2 via SMS) 1980s* Sep 2010 de-identification*, historical*
UKP Billing 2000 Planned*
Diagnoses (IDC9) 2.2M/190K
KUH/O2/Epic Nov 2007 Sep 2010 various*
UKP Billing 2000 Planned*
Medications 18M/60K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Nursing Observations 390M/240K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Lab Results 25M/120K
KUH/O2/Epic Nov 2007 Sep 2010 various*
Procedures (CPT) 0/0
UKP Billing 2000 Planned*

HERON, KUH EHR Coverage Varies Over Time

Epic (O2) electronic medical record is the main source of our data. It is the EMR for KUH and increasingly the clinics.

  • Epic went live at KUH on November 12, 2007.
    • It started with the pharmacy, medication administration, and flowsheet modules
    • Physician Clinical Documentation was added in May of 2009.
    • Computerized Provider Order Entry (CPOE) began in November of 2010.
    • Some outpatient clinics owned by the hospital (ex: Jayhawk clinics, outpatient cancer center treatment) began using Epic as earlier but the main UKP clinic rollout of Epic started in Fall 2010. It is planned to complete in 2013.
  • Epic also contains Admission, Discharge Transfer and Patient Registration information from the Hospital Billing System (Seimens SMS). As a result, we have basic demographics on patients going back to the 1980s. But, we lack the rich clinical detail available since 2007.

We receive periodic copies of the Epic Clarity database to use as the foundation of our repository. Our current copy is from September 22, 2010, which, for example, pre-dates rollout of CPOE.


Our DeIdentificationStrategy includes shifting the dates of each patient record by up to one year. For example, if a patient was born July 1, 1960 their birth date is shifted back in time between 0 and 365 days, and and all subsequent observations are shifted by the same number of days.

Plans to add Outpatient Billing data to HERON

We plan to incorporate outpatient billing data from GE IDX in our next release. (#306) The IDX data goes back to the 2000 and will include CPT procedure codes and ICD9 diagnoses codes.

De-Identification of Patient Health Information

We have taken several steps to de-identify the patient records so they may be deemed non-human subjects research. This includes:

  • shifting dates as above
  • Removing address identifiers
  • Only providing gross geographic indicators (state and radius from KUMC) (#51)
    • with thanks to geonames for data relating zip codes to latitude/longitude

By default you have obfuscated counting access. This means the exact counts returned by queries are perturbed by a random number of patients. Obfuscation access also means you can't take advantage of analysis tools and timelines. We will work with our clinical partners on providing this greater level of access for our research community. (#314)

See also details regarding our DeIdentificationStrategy.

Diagnosis Issues

These come from the Epic PAT_ENC_DX table which is populated by the problem list and from tests which are ordered that require a diagnosis. We currently organize the data by ICD9.

  • There are many opinions on this as a source of information. One perspective is that they may be more clinically relevant but lack the completeness of professionally coded diagnoses.
  • In our next release we plan to extract the full ICD9 augmented with SNOMED concepts from the IMO hierarchies stored within Epic. (#273)

Laboratory Results Issues

We have loaded the top 500 most frequently resulted lab tests

  • These are results that were "finalized" by the lab.
  • These do not include microbiology, blood bank, pathology. Only the general laboratory results.
  • The LOINC coding was not validated by laboratory personnel. We will work towards using the local Epic categorization until the laboratory has time to map their results to national standards.

Medication Issues

  • Medications are the dispensed medications by pharmacy. We believe this data is not accurate and are working towards loading the administered medications from the MAR.
    • For now, you can use medications for exploratory activities but don't take any overall volumes as truth. We believe the medication exposure is underreported.
    • The classification of medications follows the internal Epic categories (based on First Databank).

Nursing Flowsheet Data Issues

  • We have loaded all the nursing flowsheet data with the exception of some dates. (#363)
  • Strings are not visible for privacy preservation but you will at least know that the observation was recorded. (#299)
  • The ontologies for flowsheets is an area of research for us. As a first step we are displaying the internal flowsheet organization within Epic at KUH.
    • Keep in mind that one observation which was documented on one flowsheet may appear on many different flowsheets. The best examples of this are height/weight and vital signs.

Other Data Issues

  • We haven't really loaded procedures and CPTs yet though the ontology hierarchy might lead you to think they are loaded.
  • Searching by age is an interesting topic in i2b2 (#158)
    • For the most part, you are searching based upon the patients current age for events that have occurred in the past.
    • Future releases of i2b2 should allow you to specify search criteria for the age at time of event (more relevant to infant/pediatric queries).

I2B2 Software Platform Issues

For the most part, we are not modifying the i2b2 software. Any defects found may be already shared with the i2b2 development team at Partners.

  • We haven't developed training materials beyond what is provided at the i2b2 home page.
  • We are currently running i2b2 version 1.4. Keep this in mind as you compare current functionality with current and planned i2b2 releases (1.5 and 1.6rc3).
    • We will plan to upgrade to version 1.6 of i2b2 when it is official released by their developers and when we can sequence the upgrade with our production release of HERON. (#165)

Note: we have modified the i2b2 software to display the number of facts at each level of the ontology and in some cases also display the distinct number of patients at each level. (#211)

Address any suggestions or feedback to

Updated KUMC clinical chairs on HERON's status

This morning we shared the current status of HERON with the clinical department chairs at the university. Despite loading over 200 million nursing observations into i2b2 yesterday, the system performed adequately for our standard "hypokalemia-loop diuretic" example. I am sharing the Powerpoint and Timeline word documents which provide a good picture of our current state.

  • Feedback from the chairs: late afternoons would be the best time to engage clinical faculty conducting research.

Here's a current i2b2 snapshot:

Informatics at KU Med Center

A person working in partnership with an information resource is "better" than that same person unassisted.

-- Friedman 2009

We referenced that "fundamental theorem" of biomedical informatics to introduce the discipline in his recent clinical research presentation:

Friedman goes on to explain that informatics is more about people than technology:

NOT: information resource > person

One can view the whole EHR as one giant intervention where the healthcare system (providers, nurses, pharmacists, allied health) are the subjects. Will it be a chewable Flintstone vitamin or a barium enema?