Recent posts (max 20) - Browse or Archive for more

Want to swim in Big Data? We are seeking summer interns

The Division of Medical Informatics, in the Department of Biostatistics, at the University of Kansas Medical seeks highly motivated individuals with a passion for software development, scientific discovery, and improving healthcare.  We are offering a paid student intern position for students in engineering and science to develop software and systems to support clinical and medical informatics research.  Our environment and topics include:

·         our clinical data warehouse which contains over 600 million facts for over 1.9 million patients

·          integrating statistical analysis and machine learning with data warehousing

·         understanding the impact of electronic medical records on patient care

·         clinical trial management systems and integration of biospecimens to support personalized medicine

·         developing methods for increasing clinical research participation in the community.

You will gain experience working in a software engineering environment using open source technologies.

 

Desirable skills include:  Experience programming a higher level language (C, C++, Java) and with scripting languages (Python). Proficiency with Microsoft Office products (Word, Excel). Experience developing in a LINUX/UNIX environment and working as part of a team using code management systems (ex: Subversion, Hg). Experience developing relational database driven applications (Oracle, MySQL), database design and SQL.   Experience programming dynamic web pages (such as JSP, PHP, Drupal, or cgi script based). Client side scripting experience with Javascript. Apache web server administration, and LINUX or UNIX system administration. Experience in knowledge discovery in databases (KDD) and data mining is also desirable: especially data preprocessing and transformation and frameworks for analysis. Previous experience in or knowledge of medicine, biology, and healthcare systems.

 

To learn more about what medical informatics is doing, visit:  http://informatics.kumc.edu and our blog http://informatics.kumc.edu/work/blog

 

If you are interested in a position, please forward your resume and a cover letter to: rwaitman@…

 

HERON: Big Hill release adds home medications, other medication orders, and expands UHC data

The Big Hill release adds support for home medications and expands UHC coverage. Previously, we only had medications that are dispensed by the hospital pharmacy; now, you'll be able to see

  • medications that the patient reports taking at home (so called "historical medications"), as well as
  • all other orders, which includes prescriptions, inpatient medication orders, and discharge medication orders.
  • data for an additional 100K patients

The UHC data, although primarily administrative, provides a new view on a patient's interaction at the hospital. New UHC concepts include:

  • ICU length of stay: Search on time spent in ICU.
  • Admission and Discharge status concepts: Search where patients came from prior to admission or went to upon discharge.
  • Readmission concept: Helps you search for patients readmitted after discharge to their home, allowing specification to a desired number of days.
  • Clinical Classification Software (CCS) ICD-9: These codes provided by AHRQ collapse ICD-9 diagnosis and procedure codes into a smaller number of categories useful in analyzing data. See AHRQ  web site.
  • All Patient Refined Diagnosis Related Groups (APR DRGs): DRG codes expanded to include 4 subclasses in severity of illness and mortality subgroups for each code. See  web site.
  • Major Diagnostic Categories (MDCs): Search these diagnosis categories created by combining ICD-9 diagnosis codes into 25 MDCs. These codes, which are used primarily for administrative and billing purposes, provide another view on patient data.
  • Comorbidity: Search 29 comorbidity categories.

Currently the UHC data is limited to a 1-year time period (Nov. 2010-Nov. 2011). Look for additional years in future releases. This data is limited to hospital encounters and lacks clinic data.

HERON Big Hill Contents Summary

This month, our tour of  rivers and lakes in Kansas honors lake Big Hill.

The HERON repository contains approximately 630 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 18.0M 1.90M
KUH Billing ( O2 via SMS) 1980s Feb 2012 various*
UKP Billing 2000 Feb 2012
9.5K 9.5K Frontiers participant registry Jun 2009 Feb 2012
183K 183k Social Security Death Index 1962 Feb 2012
Diagnoses (IDC9) 18.7M 602K
KUH/O2/Epic Nov 2007 Feb 2012 various*
UKP Billing 2000 Feb 2012
Medications 78.1M 245K
KUH/O2/Epic Nov 2007 Feb 2012 various*
Nursing Observations 463M ?
KUH/O2/Epic Nov 2007 Feb 2012 various*
Lab Results 72.6M 257K
KUH/O2/Epic 2003 Feb 2012 various*
Procedures (CPT) 9.6M 542K
UKP Billing 2000 Feb 2012
Specimens 27.8K 2.80K
KUMC Biospecimen Repository ? Jan 2012
Cancer Cases 9.1M 62.8K
KUH Cancer Registry 1950s Jan 2012 labels*
 Hospital Quality Metrics .97M 19.7K
University HealthSystem Consortium (UHC) N/A Nov 2011 #997
All 630M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Enhancements and Problems/Defects/Issues Addressed in this Release

#819
dropping a patient set on KM plugin failed
#917
comprehensive query of UHC data
#1004
Search by Home Meds
#1028
HERON offers useless Medication Dispense search by value criterion

Outstanding Problems/Defects/Issues

Ticket Summary Keywords
#701 stats missing for Malignant neoplasm of prostate counts
#840 Dates for HERON Demographics observations are irrelevant/misleading demographics, data
#867 HERON flattens multiple race facts out of Epic/O2 to 1 demographics, data
#997 UHC data in HERON limited to 1-year snapshot
#1096 R plug-in failed with error dialog due to lack of write permission, X server
#1101 Missing patient numbers marital status concepts
#1109 patient set query for burn dx and any FLO #s takes over 170 CPU minutes performance
#334 HERON prevents system access based on incomplete chalk records regulatory
#441 ICD9 ontology is incomplete; e.g. code 426.82 Long qt syndrome
#450 cannot find patients based on UKP/IDX procedure codes that are not CPT codes
#502 Stats update truncates concept labels containing [] counts
#553 stats not showing properly in the Place ontology counts
#591 i2b2 hangs when you modify the "occurs" criteria multiple times and setvalues are involved i2b2-upstream
#699 patient sets are invalidated by monthly HERON refresh previous-queries
#779 HERON stats aren't integrated with the way i2b2 1.6 handles stats counts
#821 rgate includes misleading data points in KM analysis
#1003 not all Flowsheet measures should prompt for numeric value constraint
#1036 Timeline Plugin: Border Case timeline

A medical informatics perspective on the role of metadata in the data lifecycle

Our group has been invited to a panel discussion:

  •  Metadata Forum
    A discussion of the role of metadata in the data lifecycle
    Friday April 13, 2012
    11:30am - 1:00pm
    Watson Library, 503A and 503B

The panel questions have inspired this bit of thinking out loud:

What is your research area or discipline?

Our discipline is medical informatics. We're involved in two kinds of research:

  1. informatics services to support KUMC researchers, including areas such as cancer center, health of the public, etc.
  2. research in medical informatics per se; that is: looking at the electronic medical record (EMR) as a medical intervention and studying its impact

What do your data look like?

To our customers, we present a large and growing set of medical observations -- currently over 630 million observations -- using a tool called i2b2, developed at Harvard/Partners with NIH funding. It presents a hierarchy of terms:

  • under demographics it has age, gender, etc.;
  • diagnoses are organized using the ICD9 terminology;
  • there are terms for medications, lab results, procedures, etc.

This allows cohort identification queries such as "how many patients does the University of Kansas Hospital (KUH) see each year that are over the age of 35, diagnosed with diabetes, and had an abnormal glucose lab result?"

The data is not necessarily “ours” in that we take data from multiple sources, aggregate it, and provide a tool for knowledge discovery. For example, we integrate vital statistics from the U.S. Social Security Administration, so that the query above can be refined a la "... and how many of them are dead, according to the SSA?"

Are they structured or unstructured?

So far, we have our hands full with structured data (pulled from EMR, billing system, tumor registry, etc.).

A lot of work in our field is concerned with natural language processing of physician's notes.

We haven't begun work in that direction, but we are among the first to make use of i2b2 to explore nursing observations. They dominate our database (over 400 million observations) and quite likely they dominate the use of EHR usage in the hospital. Plus, they contain basic information such as height and weight that is essential to screening for many studies.

Are they typically represented in tables or some other form (audio, video, transcripts)?

Integrating medical imaging with i2b2 has been done elsewhere, but we haven't gone beyond brainstorming about it. We were tangentially involved in a project to collect video samples from patients for one study.

But the vast majority of our work is with data stored in tables.

How are your data typically documented - in the form of a document, or in some structured form?

The bulk of our data comes from the KUH EMR. Much of our data is documented by the EMR vendor, and following long-standing billing practice, standards for diagnoses (ICD9, soon to be ICD10) and procedures (CPT) are used for much of the data in the EMR. But the hospital heavily customizes the installation as well. For example, the formulary of medicines and the list of labs are curated by the hospital.

Moving nursing flowsheets from paper to the EMR initially involved a huge number of design decisions made in very short order; many of those decisions are reconsidered as they gain experience. There is some overlap between the terms used in KUH flowsheets and standards such as SNOMED-CT and LOINC, but we have only scratched the surface of the work of mapping these terminologies.

Sources other than the EMR also vary as to the level of standardization of terminology. Our integration of the KUH tumor registry makes fairly straightforward use of the national standard for cancer registries, NAACCR. But our biospecimen repository uses a locally-curated terminology.

The bulk of this documentation is in tables and spreadsheets, with some documents and diagrams mixed in.

If your metadata are structured please describe that structure. Is it defined by something like a formal XML schema?

One way or another, we fit all of our metadata into i2b2's database schema. As a byproduct, i2b2 can produce an XML form of the metadata, following one of its XML schemas.

Is it common in your area to think in terms of a data lifecycle?

If so, what does that view include – (concepts and measures shared across studies?, data reuse?)

We reload our data repository from the source systems monthly. This is something of a compromise between real-time updates from the EMR and one-time data gathering exercises such as chart reviews.

Our process for updating metadata is something of a patchwork. For flowsheets, we updated it monthly along with the data. For ICD9 and CPT, we plan to update as they republish annually, but we haven't tackled that just yet.

Are there tools available which help manage lifecycle metadata?

Various tools are under development in the i2b2 community; e.g.  Health Ontology Mapper (HOM) by Rob Wynden et. a. at UCSF. We haven't investigated them in much depth, yet.

Can the metadata be expressed in Resource Description Framework (RDF) format as part of Linked Open Data?

NCBO is developing  ontology services that integrate with i2b2 and provide RDF mappings. Again, we haven't investigated them in much depth, yet.

Is there an archive offering ongoing curation of your data available to you?

How does that operate? Are there issues with privacy, data size, financing etc.)?

Are there requirements from that archive for how data and metadata are represented?

We interact with varying sorts of metadata curation, as discussed under documentation above.

Setting up a governance structure was a major task that took several months in the start-up phase of our clinical data repository project. We have a data request oversight committee (DROC) with representation from

  • the hospital (which provides the bulk of the EMR data),
  • the clinics (which originally provided diagnosis and procedure information from billing systems, but are increasingly adopting the EMR), and
  • KU medical center itself (which manages the biospecimen repository etc.).

To address HIPAA requirements for dealing with protected health information, not to mention institutional liability, we have technical approaches to de-identification, network security, etc.

Sources such as the tumor registry and biospecimen repository are curated data as such. The hospital is an institution of long standing that has robust systems for long-term EMR storage, though perhaps recording vital signs wouldn't normally be called curation.

The governance policies include being able to trace all data in our system back to its source. The i2b2 database schema includes auditing fields (import_date, update_date, sourcesystem_cd, ...) that make this reasonably straightforward.

Moving forward – Would it be useful for us to have more sessions?

A number of i2b2 sites participate in federated query networks which allow researchers to broaden their cohort identification queries and validate their findings more widely. In the medium to long term, we're interested in the sort of terminology alignment that it takes to participate in these networks, but it's not yet high on our list of priorities.

Another motivation for terminology alignment is health information exchange. We're monitoring HIE efforts in Kansas, but again, it's not yet high on our list of priorities.

As we complete other projects and make room for more work on terminology alignment and data interchange, we hope to be able to participate more actively.

  • Posted: 2012-04-12 13:35 (Updated: 2012-04-13 17:37)
  • Author: dconnolly
  • Categories: (none)
  • Comments (0)

The Great Blue Herons at Cornell are very prolific

We are very excited as one of our mascots, the Great Blue Heron, is currently in nesting season. Cornell has a terrific web cam

 http://www.allaboutbirds.org/page.aspx?pid=2433

They now have 5 eggs in their nest which is more prolific than usual.

See Cornell Lab of Ornitology's Great Blue Heron Nest Web Cam!

We are very excited as one of our mascots, the Great Blue Heron, is currently in nesting season. Cornell has a terrific web cam

 http://www.allaboutbirds.org/page.aspx?pid=2433

HERON El Dorado Release incorporates searching by MSDRG and LOS

This release includes hospital quality measures (UHC) data integration which enhances search capabilities by introducing new concepts, such as length of stay and MSDRG. 109K UHC observations from the November 2010 - November 2011 are included in this release. This is our first step in making UHC data available. Look forward to additional UHC data in future releases.

HERON El Dorado Contents Summary

This month, our tour of  rivers and lakes in Kansas honors lake El Dorado.

The HERON repository contains approximately 630 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 18.0M 1.90M
KUH Billing ( O2 via SMS) 1980s Feb 2012 various*
UKP Billing 2000 Feb 2012
9.1K 9.1K Frontiers participant registry Jun 2009 Feb 2012
183K 183k Social Security Death Index 1962 Feb 2012
Diagnoses (IDC9) 18.4M 596K
KUH/O2/Epic Nov 2007 Feb 2012 various*
UKP Billing 2000 Feb 2012
Medications 29.1M 107K
KUH/O2/Epic Nov 2007 Feb 2012 various*
Nursing Observations 452M ?
KUH/O2/Epic Nov 2007 Feb 2012 various*
Lab Results 71.5M 253K
KUH/O2/Epic 2003 Feb 2012 various*
Procedures (CPT) 9.5M 539K
UKP Billing 2000 Feb 2012
Specimens 27.3K 2.77K
KUMC Biospecimen Repository ? Jan 2012
Cancer Cases 9.1M 62.6K
KUH Cancer Registry 1950s Jan 2012 labels*
 Hospital Quality Metrics 109K 19.7K
University HealthSystem Consortium (UHC) N/A Nov 2011 #997
All 630M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Enhancements and Problems/Defects/Issues Addressed in this Release

#724
Glucose lab test concepts 2010 and 2011 are duplicates?
#736
Tumor registry search by SEER site recode, combining primary site, histology
#775
using a 1.4 query from the Workplace locked up the i2b2 1.6 client
#834
query by MSDRG, length of stay, service line from UHC 1-year snapshot
#842
HERON terms for Biospecimen repository missing "Viable Lymphocytes" etc.
#852
custom HERON terms (race, ethnicity, flowsheets, labs, ...) lack tooltips
#859
"Medications" label is misleading, since HERON only contains Dispensed Medications
#863
Tumor Registry Vital Status and other Followup data should be associated with date of last contact, not date of diagnosis
#864
getting results from long-running i2b2 queries
#866
"Run Query" dialog should only show working options

Outstanding Problems/Defects/Issues

Ticket Summary Keywords
#701 stats missing for Malignant neoplasm of prostate counts
#840 Dates for HERON Demographics observations are irrelevant/misleading demographics, data
#867 HERON flattens multiple race facts out of Epic/O2 to 1 demographics, data
#997 UHC data in HERON limited to 1-year snapshot
#1096 R plug-in failed with error dialog due to lack of write permission, X server
#1101 Missing patient numbers marital status concepts
#1109 patient set query for burn dx and any FLO #s takes over 170 CPU minutes performance
#334 HERON prevents system access based on incomplete chalk records regulatory
#441 ICD9 ontology is incomplete; e.g. code 426.82 Long qt syndrome
#450 cannot find patients based on UKP/IDX procedure codes that are not CPT codes
#502 Stats update truncates concept labels containing [] counts
#553 stats not showing properly in the Place ontology counts
#591 i2b2 hangs when you modify the "occurs" criteria multiple times and setvalues are involved i2b2-upstream
#699 patient sets are invalidated by monthly HERON refresh previous-queries
#779 HERON stats aren't integrated with the way i2b2 1.6 handles stats counts
#821 rgate includes misleading data points in KM analysis
#1003 not all Flowsheet measures should prompt for numeric value constraint
#1036 Timeline Plugin: Border Case timeline

We are recruiting for a Research Assistant Professor in Medical Informatics

Position Summary This Research Assistant Professor Faculty position is in the Division of Medical Informatics, Department of Biostatistics, within the School of Medicine.

The division of medical informatics seeks highly motivated individuals with a passion for software development, scientific discovery, and improving healthcare. Be part of a rapidly growing team developing informatics to further translational research (KUMC just received a NIH Clinical and Translational Science Award beginning June 2011) and serving a dynamic community (Kansas City was selected by Google as the first ultra high-speed fiber connected community).

Responsibilities will include developing informatics infrastructure capabilities, conducting research, and especially collaborative research. The position is expected to engage in collaborative research with other faculty from programs and departments within the School and University. The candidate will also be expected to collaborate with the State of Kansas and affiliate organizations such as the University of Kansas Hospital. The candidate may also be expected to teach courses for graduate students in biostatistics and other disciplines.

Key Roles and Responsibilities:

  • Work as an independent informatician to provide collaborate research support related to the development of informatics solutions for KUMC researchers and affiliates.
  • Possess the ability to design/develop key components of the informatics infrastructure, including terminology, data models, knowledge resources, dynamic end-user interfaces, and aggregations of data designed to support research.
  • Participate in team based software development and system management.
  • Evaluate clinical and research information systems and interventions at KUMC and affiliate organizations.
  • Contribute to the writing of grants to support new projects and the writing of manuscripts to publish findings from ongoing project and system evaluations.
  • Teach informatics courses for graduate students in biostatistics and other disciplines.
  • Perform other duties as may be assigned by the division director or chair.

Required Qualifications:

PhD., M.D., or D.O. and training in biomedical informatics through advanced degree programs, fellowship training, or comparable experiences in knowledge management, clinical decision support, translational informatics and relevant informatics standards. The ability to promote effective teamwork in a rapidly changing multidisciplinary research environment. Superior interpersonal and communications skills as demonstrated by excellence in speaking, writing and listening. Informatics research experience with clinical, public health, and medical administrative systems.

Preferred Qualifications:

The individuals expertise should draw from the broader field of biomedical informatics to complement and expand the department's capabilities. Desirable experiences and domains include: clinical information system and clinical decision support public health informatics quantitative/qualitative informatics evaluation methods statistics, biostatistics and quality management methods Database and application development HL7 data integration/system architecture ontology management (UMLS, LOINC, SNOMED, FDB, RxNORM) data warehousing; the division's HERON clinical repository utilizes i2b2 (over 500 million observations) to store information from our affiliate clinical organizations clinical research informatics; both Velos and REDCap are used for clinical research information systems knowledge discovery and statistical learning methods natural language processing laboratory/pathology information systems especially in support of tissue management and cancer research

Feel free to contact me, rwaitman@… to learn more, visit the rest of this wiki to learn about our work.  Or, just go ahead and apply for  position M0203705: Research Assistant Professor in Medical Informatics.

HERON Clinton Release includes Race/Ethnicity fix

In this release the long-standing race/ethnicity issue is resolved. Previously, both were erroneously combined. With this fix, race and ethnicity can be searched individually.

HERON Clinton Contents Summary

This month, our tour of  rivers and lakes in Kansas honors lake Clinton.

The HERON repository contains approximately 600 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 16.4M 1.89M
KUH Billing (O2 via SMS) 1980s Jan 2012 various*
UKP Billing 2000 Jan 2012
8.36K 8.36K Frontiers participant registry Jun 2009 Jan 2012
182K 182k Social Security Death Index 1962 Jan 2012
Diagnoses (IDC9) 18.1M 590K
KUH/O2/Epic Nov 2007 Jan 2012 various*
UKP Billing 2000 Jan 2012
Medications 28.3M 104K
KUH/O2/Epic Nov 2007 Jan 2012 various*
Nursing Observations 439M ?
KUH/O2/Epic Nov 2007 Jan 2012 various*
Lab Results 70.3M 250K
KUH/O2/Epic 2003 Jan 2012 various*
Procedures (CPT) 9.5M 535K
UKP Billing 2000 Jan 2012
Specimens 26.2K 2.71K
KUMC Biospecimen Repository ? Jan 2012
Cancer Cases 8.96M 61.9K
KUH Cancer Registry 1950s Jan 2012 labels*
All 614M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Enhancements and Problems/Defects/Issues Addressed in this Release

#491
i2b2 help refers Partners for help; does not credit geonames
#698
Race/Ethinicity mixed up in HERON

Outstanding Problems/Defects/Issues

Ticket Summary Keywords
#701 stats missing for Malignant neoplasm of prostate counts
#840 Dates for HERON Demographics observations are irrelevant/misleading demographics, data
#867 HERON flattens multiple race facts out of Epic/O2 to 1 demographics, data
#997 UHC data in HERON limited to 1-year snapshot
#1096 R plug-in failed with error dialog due to lack of write permission, X server
#1101 Missing patient numbers marital status concepts
#1109 patient set query for burn dx and any FLO #s takes over 170 CPU minutes performance
#334 HERON prevents system access based on incomplete chalk records regulatory
#441 ICD9 ontology is incomplete; e.g. code 426.82 Long qt syndrome
#450 cannot find patients based on UKP/IDX procedure codes that are not CPT codes
#502 Stats update truncates concept labels containing [] counts
#553 stats not showing properly in the Place ontology counts
#591 i2b2 hangs when you modify the "occurs" criteria multiple times and setvalues are involved i2b2-upstream
#699 patient sets are invalidated by monthly HERON refresh previous-queries
#779 HERON stats aren't integrated with the way i2b2 1.6 handles stats counts
#821 rgate includes misleading data points in KM analysis
#1003 not all Flowsheet measures should prompt for numeric value constraint
#1036 Timeline Plugin: Border Case timeline

HERON Bow Creek Release brings Cancer Survival Analysis, R integration

The highlights for this release include:

Our cancer survival analysis plug-in is based on work by Segagni et. al.:

HERON Bow Creek Contents Summary

This month, our tour of  rivers and lakes in Kansas honors Bow Creek river.

The HERON repository contains approximately 600 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 16.0M 1.89M
KUH Billing (O2 via SMS) 1980s Dec 2011 various*
UKP Billing 2000 Dec 2011
8.18K 8.18K Frontiers participant registry Jun 2009 Dec 2011
182K 182k Social Security Death Index 1962 Dec 2011
Diagnoses (IDC9) 17.9M 586K
KUH/O2/Epic Nov 2007 Dec 2011 various*
UKP Billing 2000 Dec 2011
Medications 28.0M 103K
KUH/O2/Epic Nov 2007 Dec 2011 various*
Nursing Observations 434M ?
KUH/O2/Epic Nov 2007 Dec 2011 various*
Lab Results 69.8M 248K
KUH/O2/Epic 2003 Dec 2011 various*
Procedures (CPT) 9.4M 531K
UKP Billing 2000 Dec 2011
Specimens 25.5K 2.66K
KUMC Biospecimen Repository ? Dec 2011
Cancer Cases 8.96M 61.9K
KUH Cancer Registry 1950s Dec 2011 labels*
All 606M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Enhancements and Problems/Defects/Issues Addressed in this Release

#497
When you "reuse" queries, you get a different result the second time
#748
i2b2 analysis plug-in supporting Kaplan Meier survival curves
#787
HERON to i2b2 link should be disabled when training is not current
#789
why are the flowsheet stats down from 407M to 328M?

Outstanding Problems/Defects/Issues

Ticket Summary Keywords
#701 stats missing for Malignant neoplasm of prostate counts
#840 Dates for HERON Demographics observations are irrelevant/misleading demographics, data
#867 HERON flattens multiple race facts out of Epic/O2 to 1 demographics, data
#997 UHC data in HERON limited to 1-year snapshot
#1096 R plug-in failed with error dialog due to lack of write permission, X server
#1101 Missing patient numbers marital status concepts
#1109 patient set query for burn dx and any FLO #s takes over 170 CPU minutes performance
#334 HERON prevents system access based on incomplete chalk records regulatory
#441 ICD9 ontology is incomplete; e.g. code 426.82 Long qt syndrome
#450 cannot find patients based on UKP/IDX procedure codes that are not CPT codes
#502 Stats update truncates concept labels containing [] counts
#553 stats not showing properly in the Place ontology counts
#591 i2b2 hangs when you modify the "occurs" criteria multiple times and setvalues are involved i2b2-upstream
#699 patient sets are invalidated by monthly HERON refresh previous-queries
#779 HERON stats aren't integrated with the way i2b2 1.6 handles stats counts
#821 rgate includes misleading data points in KM analysis
#1003 not all Flowsheet measures should prompt for numeric value constraint
#1036 Timeline Plugin: Border Case timeline

Adding SEER Site Recode to HERON Tumor Registry integration

Our HERON tuttlecreek release a couple months ago included initial integration of data on ~60,000 cancer cases from the KUMC tumor registry. We organized the  NAACCR terms based on work by colleagues at the Kimmel Cancer Center in Philadelphia and Group Health Cooperative in Seattle:

NAACR terms for tumor registry

But if you want to find, for example, brain cancer cases, due to an outstanding issue (#733), you have to be an expert in codes for primary site, histology, etc.:

For our next release, based on work with  John Keighley, we're providing query by SEER Site Recode, a state of the art method for combining primary site and  histology:

screenshot of SEER Site Recode term hierarchy

Under the hood: Using python to convert the rules table to SQL

The  SEER Site Recode ICD-O-3 (1/27/2003) Definition, lays out the rules in a fairly convenient HTML table:

Converting that table to code manually might have been straightforward, but it would have been repetitive and error-prone; so like so many  Geeks and repetitive tasks, I wrote a script to automate it.

source:tumor_reg/seer_recode.py weighs in at about 200 lines, including whitespace and a handful of test cases. It reads the HTML page (well, I feed it through  tidy first to clean up some table markup) and produces

  1. A term hierarchy in CSV format (source:heron_load/curated_data/seer_recode_terms.csv)
  2. Rules to recode our our ~60K cancer cases as a SQL case statement (source:heron_load/seer_recode.sql).

The resulting SQL weighs in at about 500 lines. Handling all the different kinds of rules in the table was fun; a lot more fun than writing this sort of SQL by hand:

case
/* Lip */ when (site between 'C000' and 'C009')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '20010'

...

/* Melanoma of the Skin */ when (site between 'C440' and 'C449')
  and (histology between '8720' and '8790') then '25010'

...

/* Cranial Nerves Other Nervous System */ when (site between 'C710' and 'C719')
  and (histology between '9530' and '9539') then '31040'

/* ... */ when (site between 'C700' and 'C709'
   or site between 'C720' and 'C729')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '31040'

Our Executive Vice Chancellor has an amazing bird.

Another feathered friend of our informatics initiative.

 KUMC's Barbara Atkinson, MD, and Buddy the "Rock Chalk"-singing parrot

AMIA 2011 Highlight: Dr. Bill Tierney's 10 year story on health care in Africa

Tierney's inspiring closing keynote was truly a highlight of #amia2011. Standing ovations for a great guy and great speaker.
--  Gunther Eysenbach, Oct 26

That's one tweet among a  chorus of #amia2011 tweets about Tierney, including:

  • Death by HIPAA: shouldn't sacrifice care on altar of privacy #AMIA2011 keynote by Tierney
  • LIVE: #AMIA2011 Bill Tierney uses Clem McDonald's 1998 JAMA " Canopy Computing" paper; great metaphor for connected health data, no silos!

 AMIA 2011 keynote recordings are now available:

"Dr. Tierney’s work has taken him far afield—to Kenya, Africa—to use electronic health records and to gather information from patients, applying the data to critical points in the patient–provider relationship to improve the quality and cost-effectiveness of health care. He led the effort to develop the first ambulatory electronic medical record system in sub-Saharan Africa, which has evolved into a comprehensive, open-source electronic medical record system that has been implemented in more than a dozen developing countries."

The video editing is a little rough, with quite a bit of conference administrivia at the beginning. But by the time he gets to "a Case" at 9:40, I'm sure you'll be hooked. Even if you're not an informatics geek, I'm sure you'll find the "10 year story" (starting at 37:30 into the video) inspiring.

HERON Pawnee release paves the way to a richer data repository

Our Pawnee release is the first to use i2b2 1.6. This allows us to take advantage of i2b2 1.6 features in future releases:

  • visit information, such as length of stay, age at visit
  • provider information
  • primary diagnosis vs secondary; billing vs clinical
  • medication routes, frequency

We are still in the process of re-verifying data integrity and functionality after the upgrade, so consider these statistics provisional:

CATEGORYFACTSPATIENTS
i2b2 464M
Demographics 14.1M 1.88M
HICTR Participant 7.65K 7.65K
Diagnoses 17.5M 578K
Flowsheet 328M
Labtests 68.3M 244K
Medications 27.0M 99K
Procedures 9.2M 527K
Specimens 24.1K 2.56K

Note: our statistics regarding total number of flowsheet observations is reduced significantly from 470 million to 328 million. We are investigating the cause as the we think the data is intact (see ticket #789). Please give us feedback if you notice discrepancies with any prior queries where the counts are down significantly relative to last month's build.

Notice in the right hand side of the user interface that i2b2 1.6 allows you constrain observations to the same encounter. For example, you might require that patients have drug exposure (furosemide) and laboratory monitoring (serum potassium) during the same encounter while another observation like "diagnosis of diabetes" can be treated independently of encounter. Right now, we are not sure we have this working (see defect #790) but we are looking into it.

This release also fixed several outstanding bugs:

#25
i2b2 web client gives cryptic message in case of insufficient privileges
#180
expanding Patient record set tree view hangs
#355
Queries for "less than or equal to" and "greater or equal" fail (use "less than" and "greater than" instead)
#732
i2b2 login fails for new users
#764
bsr_load fails due to too many MRN mismatches
#783
i2b2 1.6 client gives 0 results for 1.4 queries

REDCap Upgrade

We are excited to move to a newer version of REDCap 4.7.0. This change will bring in a lot of big/useful changes that we are really thankful to the REDCap team at Vanderbilt ( http://project-redcap.org/) for.

First let us list our favorite features. Based on our interactions with the users, these are features we believe our users will benefit most from:

Data Quality module New module to help quickly find discrepancies and errors in your project data.

Tablet Compatibility More compatibility with tablets/iPads. Project pages now have a simpler layout with a single-page scroll, rather than the page being broken up into three different windows/frames, which would make it difficult to navigate when using some mobile devices and tablets.

Ability to edit survey responses Users can now edit survey responses that have been created by participants on the survey page. Previously, survey responses were read-only.

Better calculated fields Checkbox fields can now be referenced inside calc field equations (previously they could not, although they could always be used in branching logic). Checkboxes can be referenced in a calc field equation in the same format as in branching logic (e.g. [variable(option_code)] ), in which the resulting value for the checkbox will be either 1 or 0 for "checked" or "unchecked", respectively.

New default graphical plotting service for the Graphical Data View & Stats page When this service is enabled, the Graphical Data View & Stats page will display both the graphical plots and the descriptive stats (with some new additional metrics) all on a simple, single page (rather than as two separate tabs). For the graphical plots displayed, if you click any of the points on the plot, it will navigate you over to that exact record on the data entry form.You can toggle any bar chart to view it as a pie chart. You have the option to select an individual record/response, which will then be highlighted on the page with respect to the other values in the plots.

New graphical and statistical display options for surveys Users administering surveys can enable the option that allows their survey participants to view aggregate survey results after they have completed the survey, in which it allows respondents to view the aggregate survey data as graphical plots (bar charts and scatter plots), as descriptive statistics (missing, unique, min, max, mean, standard deviation, etc.), or as both.

External Links functionality New project module that allows users to set up custom links to websites outside REDCap or to other REDCap projects (similar to creating bookmarks). These links will appear on the left-hand project menu and can be accessed at any time by users who are given privileges to do so. There are several custom settings that may be toggled to allow one to control the look and behavior of the link as desired.

Copy a project with records The Copy Project page in any given project now has the option to allow users to copy all records within the project when making a copy of the project. This also copies any documents uploaded to "file" upload fields for records/responses on a form or survey.

New real-time data search feature All projects now have a data search feature on data entry pages that allows users to search the project data in real time.The user must first select the field/question to search under, then they can begin typing in the text box to perform the search. It will begin bringing back results immediately as they type. When they see the record they’re looking for, they can click it to navigate directly to it.

Text Size on Surveys New “resize font” setting at the top of all survey pages allows the participant to easily change the size of all the text on the survey page.

Better Timeouts More robust and aesthetically pleasing implementation of auto-logout messages to users when they have been inactive in REDCap for the designated amount of time. If the user has been away from the computer longer than the auto-logout time and they return, it will not have 3 pop-up messages piled on top of each other as in previous versions, but will instead have just one message letting the user know that they have been automatically logged out.

Calendar events On the Calendar, it now displays the name of the Data Access Group to which the record belongs if the calendar event is attached to a record that is in a Data Access Group.

New calculation functions for “calc” fields Many more functions can now be utilized in calculations. These include sum(), mean(), median(), min(), max(), roundup(), rounddown(), abs(), stdev().

Ability to use branching logic and calculations across multiple events It has always been possible to use branching logic and calculations across multiple data entry forms, but it was not possible (until now) to use branching logic or calculations across multiple events/time-points in longitudinal projects. Now this powerful new feature has been added. Cross-event branching/calculations can be implemented for a field by using a slightly different form of the syntax normally used for branching/calcs (the variable must be prepended with the unique event name in square brackets, e.g. [unique_event_name][variable_name], in which the unique event name can be found on the Define My Events page in every longitudinal project)

“Start Over” feature for survey participants invited via Participant List The survey page now allows participants invited via the Participant List to start over and re-take the entire survey if they return to the survey when they did not complete it fully, but the “Start Over” feature is only available if the Save & Return Later feature is disabled or if it is enabled and the participant did not click the Save & Return Later button. In previous versions, if the participant left the survey and did not use the Save & Return Later feature (whether or not the Save & Return Later feature was enabled), then they would not be allowed to return to the survey at all, in which their response would forever be counted as “incomplete”/”partial”.

Along with the above features there are a lot of features and fixes to bugs that you might have faced.

If you have any questions related to these changes or any other changes that might not have been listed above, do not hesitate to contact us at CRISSupport@….

Happy Holidays

  • Posted: 2011-12-12 11:53 (Updated: 2011-12-12 12:13)
  • Author: badagarla
  • Categories: (none)
  • Comments (0)

Automatically populating REDCap fields from earlier forms

In our work on the Alzheimers Disease Core Center, we had information entered into one REDCap form that we wanted to see in another.

REDCap doesn't offer this out of the box, so we added a little code (attachment:calc_text.patch Download ; #569). The way it works is a little quirky:

  1. In the usual REDCap fashion,
    1. Make a new field
    2. Choose Calculated Field for Field Type.
    3. Put the name of the source field in square brackets in the Calculation Equation. For example, [last_name]
  2. Now for the quirk: start the Variable Name for this new automatically populated field with
    • text_ for a single-line text field; for example: text_display_last_name
    • textarea_ for a multi-line text area.

OK, so using the variable name like this is sort of cheating, but hey... it seems to work for now.

If you would like us to show it to you in person, feel free to come to our FrontiersInformaticsClinic, which meets today at 4pm in Dykes 410. If today doesn't work, we're there every other Tuesday. Check the  KUMC calendar.

HERON Tuttlecreek release brings initial Tumor Registry integration

This month's HERON release integrates data from the KUH Tumor Registry, with 65,000 cases dating back to the 1950s(#547). We have also added support for finding patients in KCK county school districts (#531).

Russ regularly gives presentations on our work, describing the integration of various sources into HERON. Since this diagram from How Medical Informatics and HERON Can Help Your Research?, given on November 17, we have integrated the Social Security death master file as well as the tumor registry:

HERON Tuttlecreek Contents Summary

This month, our tour of rivers and lakes in Kansas honors  Tuttle Creek Lake.

The HERON repository contains approximately 570 million real observations from the hospital, clinics, and research systems:

Observation Patients Source Go-Live Snapshot Issues
Demographics 15.9M 1.88M
KUH Billing (O2 via SMS) 1980s Oct 2011 various*
UKP Billing 2000 Oct 2011
6.64K 6.64K Frontiers participant registry Jun 2009 Oct 2011
Diagnoses (IDC9) 17.2M 571K
KUH/O2/Epic Nov 2007 Oct 2011 various*
UKP Billing 2000 Oct 2011
Medications 26.3M 96K
KUH/O2/Epic Nov 2007 Oct 2011 various*
Nursing Observations 407M ?
KUH/O2/Epic Nov 2007 Oct 2011 various*
Lab Results 67.2M 240K
KUH/O2/Epic Nov 2007 Oct 2011 various*
Procedures (CPT) 9.1M 523K
UKP Billing 2000 Oct 2011
Specimines 23.1K 2.48K
KUMC Biospecimine Repository ? Oct 2011
Cancer Cases 6.21M 60.4K
KUH Cancer Registry 1950s Aug 2011 labels*
All 570M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON.

Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous.

We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Problems/Defects/Issues Addressed in this Release

No major issues addressed.

Outstanding Problems/Defects/Issues

#363
cannot find patients based on some common KUH Flowsheet measures
#701
stats missing for Malignant neoplasm of prostate
#840
Dates for HERON Demographics observations are irrelevant/misleading
#867
HERON flattens multiple race facts out of Epic/O2 to 1
#997
UHC data in HERON limited to 1-year snapshot
#1096
R plug-in failed with error dialog due to lack of write permission, X server
#1101
Missing patient numbers marital status concepts
#1109
patient set query for burn dx and any FLO #s takes over 170 CPU minutes

AMIA 2011: Nursing Flowsheets data and the wild west of terminology

Like a number of other CTSA sites, we're Using I2B2 in our HERON research data respository. The original domain of I2B2 was genome/phenome integration and personalized medicine. Genomic stuff is on our long-term radar, but one of our earliest wins has been mining the vast amount of data (~400M observations as of our latest release) recorded by nurses and other practitioners in flowsheets in our EMR.

Height, weight, and BMI are quite common inclusion/exclusion criteria for clinical trials, and those aren't available in the term hierarchy that comes with I2B2 out of the box.

It's been a big challenge because unlike diagnoses with ICD9 codes and procedures with CPT codes, the MedicalTerminologyMarketplace has no widespread norms for flowsheets.

We're presenting some results in Washington D.C. on Wednesday at  AMIA 2011:

  • Expressing Observations from Electronic Medical Record Flowsheets in an i2b2-based Clinical Data Repository to Support Research and Quality Improvement
    L. Waitman, J. Warren, E. Manos, D. Connolly

Stay tuned to KUBMIPresentations for presentation materials.

p.s. I'm new to AMIA. As a  long-time Web guy, there's a bit of culture shock: the  conference program only seems to be available as a big hunk of PDF or a goofy  mobile flash thing, and the "Join the conversation on twitter" box inside gives the hash tag as  AMIA2011#, with the hash at the end. Chuckle. And no open-access to the full text of the article. Sigh.

Perry is here with data through September 2011

This month, the release of our HERON research data repository honors Perry Lake for our regular monthly refresh of the data.

Join us at the Frontiers Clinical/Translational Informatics Clinic for full demonstration and discussion. We have it bi-weekly in 1040 Dykes Library from 4 -5 pm. Next workshop will be on Nov 1st, 2011.

The HERON repository contains approximately 551 million real observations from the hospital and clinics:

Observation Patients Source Go-Live Snapshot Issues
Demographics 15.4M 1.8728M
KUH Billing (O2 via SMS) 1980s Sep 2011 various*
UKP Billing 2000 Sep 2011
6.4K 6.4K Frontiers participant registry Jun 2009 Sep 2011
Diagnoses (IDC9) 16.9M 564K
KUH/O2/Epic Nov 2007 Sep 2011 various*
UKP Billing 2000 Sep 2011
Medications 25.6M 93.7K
KUH/O2/Epic Nov 2007 Sep 2011 various*
Nursing Observations 396.7M ?
KUH/O2/Epic Nov 2007 Sep 2011 various*
Lab Results 66.2M 237K
KUH/O2/Epic Nov 2003 Sep 2011 various*
Procedures (CPT) 9.02M 519K
KUH/O2/Epic Nov 2007 Sep 2011
Specimen 22.2K 2.4K
KUMC Bio Specimen Repository Apr 1996 Sep 2011
Death Index
24K 24k KUH Hospital Data Nov 2007 Sep 2011
178K 178k Social Security Administration Apr 1996 Jan 1962
All 551M

Beta Disclaimer

We are providing this early access to obtain feedback from you, the research community. While we are actively working on validating the data loaded into the system with hospital and clinic technical staff, there may be problems with our translation of data from our source systems (HospitalEpicSource and ClinicIdxSource) into HERON. Please email us at heron-admin@kumc.edu if you discover information you believe may be erroneous. We are actively working on enhancing the types of data included. Stay tuned to our roadmap to track progress toward upcoming releases.

Various Issues Still Apply

Keep in mind the issues noted in the original HERON beta notice, including:

Problems/Defects/Issues Addressed in this Release

no major issues.

Outstanding Problems/Defects/Issues

#363
cannot find patients based on some common KUH Flowsheet measures
#701
stats missing for Malignant neoplasm of prostate
#840
Dates for HERON Demographics observations are irrelevant/misleading
#867
HERON flattens multiple race facts out of Epic/O2 to 1
#997
UHC data in HERON limited to 1-year snapshot
#1096
R plug-in failed with error dialog due to lack of write permission, X server
#1101
Missing patient numbers marital status concepts
#1109
patient set query for burn dx and any FLO #s takes over 170 CPU minutes

Social Security Death Index integration expands vital statistics available in HERON

The Department of Biostatistics is excited to provide for you linkage between our clinical and administrative records and the  Social Secuity Death Master File released by the National Institute of Standards and Technology.

Previously, we only had record that a patient was deceased based upon follow up at KUH/UKP or if they died while cared for at KUH/UKP (23,850 patients indicated as "Deceased" within the "Demographics" ontology). Now we have an additional indicator that the patient has died based upon records reported to the federal government (177,706 patients indicated as "Deceased per SSA" within the "Demographics" ontology).

Note, we currently are matching patients based upon an exact match of their social security number AND their date of birth.

We think this will be a powerful addition which will allow preliminary hypotheses generation regarding mortality rates between different patient cohorts. In the future we might hope to provide analysis plugins that calculate routine survival analysis.

For a neurlogical example: HERON has 1356 patients who've ever been diagnosed with Amyotrophic lateral sclerosis (ALS aka Lou Gehrig's disease) at KUMC since 2000. Of those, the hospital knew 206 had died. Now that we can also check the social security administration, we know that 821 have died.

Using i2b2 Timeline and other analysis plug-ins in HERON

We recently worked out the technical and regulatory issues to take you beyond counting to detailed analysis of the data in HERON (#314).

Consider this example from my recent Internal Medicine Research Committee Presentation on HERON Download:

To access these capabilities, choose to check the box to also create a "patient set" when you run your query. Then, use the analysis tools tab at the top which takes you to a series of i2b2 plugins. We have found that the "Demographics (1 Patient Set)", "Demographics (2 Patient Sets)", and "Timeline" plugins to be very illuminating.

Stay tuned for more training materials or join us at the bi-weekly informatics clinic for help.