wiki:HERONTrainingMaterials/HeronDataSet

HERON users may make a DataAccess request for data that can be downloaded into programs such as Excel, SPSS or SAS.

The data is delivered in a few files in comma-separated values (CSV) format.

Patient Table

The patient table has one record for each patient in the patient set you created with your i2b2 query. Most fields are self-explanatory:

patient_numvital_statusbirth_datedeath_dateagesexlanguageracemarital_statusreligionlast_visit
5090n1960-04-1253fenglishblackschristian2011-07-20 00:00:00
17067n1962-11-25 00:00:0051menglishothermcatholic2013-02-08 00:00:00

Details:

  • vital_status: n means not dead and y means dead.
  • sex: f = female, m = male o = other
  • language: patient's language from the EMR
  • race: Race of the patient
  • marital_status: s = single m= married w = widow d = divorce x = separated p = life partner u = unknown
  • last_visit: the date of the last visit or encounter record from any of our sources about this patient

Note dateshifting for birth_date etc.

Variable Table

The variable table has one record for each of the i2b2 "concepts" you requested:

pathnamecountsindextooltip
\i2b2\Demographics\KUMC radius\Place: distance from KUMCPlace: distance from KUMC [1,281,302 facts; 1,281,302 patients]1Demographics
\i2b2\Demographics\Marital Status\Marital StatusMarital Status [2,002,289 facts; 2,002,289 patients]2Demographics
\i2b2\Labtests\KUH\CHEMISTRY\198-GENERAL CHEMISTRY\146: BILIRUBIN,INDIRECT\BILIRUBIN, INDIRECT (#146)BILIRUBIN, INDIRECT (#146) [6,190 facts; 4,172 patients]3Laboratory Tests \ CHEMISTRY (KUH) \ 198-GENERAL CHEMISTRY
  • path - the internal i2b2 concept path. \a\b\ is under \a\ and so on.
  • name - as seen in the i2b2 query tool user interface but with the count part removed
  • counts - as seen in the i2b2 query tool user interface
  • tooltip - as seen in the i2b2 query tool user interface
    • By design, tooltips correspond with paths. But our tooltip coverage is not complete.

Data Table

All of the variable data is in one large table in an entity/attribute/value (EAV) structure, demonstrated below. The entities are the patients you selected, the attributes are your variables, and the values are numbers and dates and such.

Note that Heart Rhythm appears multiple times for the same patient. Raw data sets include all facts matching the variables and patients you request; constraints on dates and modifiers are not supported. (TODO: constrain by date.)

A "cooked" summary of the data, where only a count of the Heart Rhythms and the first and last values appear, is available in REDCap CRFs. This summary is sufficient for some forms of analysis. If you need further post-processing before making use of the data with popular statistics packages, see HERONSupport for consulting possibilities.

patient_numencounter_numvariablevaltypetvalnvalunitscodecode_labelmodifiermodifier_labelinstancestart_dateend_datevariable_index
1461920715ACUTE RHEUMATIC FEVERcodeRheumatic fever without mention of heart involvementICD9:390Rheumatic fever without mention of heart involvementDiagObs:ClinicClinic (IDX)32006-04-062006-04-061
1435416174Black or African AmericancodeBlack or African AmericanDEM|RACE:blackBlack or African American12007-07-202007-07-202
9802138187VITALScode02. IrregularKUH|FLO_MEAS_ID+LINE:205682_202. Irregular1389909000442010-09-12 17:00:002010-09-12 17:00:001
9802199120VITALScode02. IrregularKUH|FLO_MEAS_ID+LINE:205682_202. Irregular1434595002362010-10-04 12:00:002010-10-04 12:00:001
9802199120VITALScode02. IrregularKUH|FLO_MEAS_ID+LINE:205682_202. Irregular1434595000312010-10-04 00:43:002010-10-04 00:43:001
2306107512004- #301070 BMI (Calculated)NE22.14KUH|FLO_MEAS_ID:301070004- #301070 BMI (Calculated)1890675000892011-01-03 08:48:002011-01-03 08:48:002
10731962183[IM600] IMMUNE SUPPRESSANTSmgKUH|MEDICATION_ID:147162GENGRAF POMedObs:HistoricalHistorical Medications1031645618000012009-09-08 15:57:002009-09-08 15:57:0014
28265258751250390 Date of Diagnosiscode0390 Date of DiagnosisNAACCR|390:0390 Date of Diagnosis12011-05-09 00:00:002011-05-09 00:00:003
84378149149WBC COUNT (#3009)NE0.4K/ULKUH|COMPONENT_ID:3009WBC COUNT (#3009)107448408200012012-09-16 21:56:002012-09-16 21:56:0013
47290910973785[IM600] IMMUNE SUPPRESSANTSNE0.5mgKUH|MEDICATION_ID:82958TACROLIMUS 0.5 MG PO CAPMedObs:Dose|mgDose Unit: mcg/mg/g1060464046000012011-10-30 09:00:002011-10-31 12:57:0014
195899163693004- #301070 BMI (Calculated)NE25.16KUH|FLO_MEAS_ID:301070004- #301070 BMI (Calculated)52884000252007-04-16 12:21:002007-04-16 12:21:002

The columns are:

  • patient_num -- an arbitrary identifier for correlating facts from the same patient
    • patients are given new numbers every month, though we keep old numbers for reference for three months
  • encounter_num -- an arbitrary identifier for correlating facts from the same encounter/visit
  • variable -- the name of the query item that this fact matched (e.g. Gender)
  • valtype -- N for numeric, or code
    • other such as (T)ext and (D)ate may be introduced in due course
    • For codes such as date of diagnosis, the value appears in the start_date column.
  • tval -- for code values, the code label appears here as well
    • When valtype=(N)umeric, E is for Equal. (We don't yet support other relationships)
    • When the relationship between the variable and the value isn't clear, e.g. `VITALS` and `02. Irregular`, you may need to consult the code info table.
  • nval -- value of the observation, when valtype is N
  • units -- goes along with nval; e.g. mmHg; null in many cases
  • code -- the internal i2b2 code for the concept (e.g. DEM|SEX:m)
    • some are standard, such as ICD9:nnn ; but many are not
  • code_label -- e.g. Black or African American for code DEM|RACE:black
  • modifier -- used to distinguish source of diagnosis (billing vs. clinical), historical meds vs med orders, to note medication doses, etc.
  • modifier_label -- e.g. Primary billing diagnosis (IDX, UHC) for DiagObs:Primary.
  • instance -- an arbitrary identifier to correlate aspects (modifiers) of one observation
  • start_date -- the start date for the observation
  • end_date -- end date of the event observed; e.g. end date of medication dispense
  • variable.index -- index into variable table

TODO: consider restoring update_date feature.

Identified Data Sets

in progress

In identified data sets, each row also contains:

  • patient_ide -- medical record number (MRN)

Code Info Table

Some analysis uses not only relationships between variables and code values, but between the codes themselves or between other parts of the term hierarchy. The code-info table enumerates all of the codes relevant to the variables in your query:

For example, one of the variables is Vitals, and one of the resulting codes is "01. Regular" which is far from self-explanatory. The tooltip column shows that 01. Regular is under 007- #205724 Vitals: \ 002- #205682 Heart Rhythm >. So now we see that Heart Rhythm is the vital sign that was measured, and the measurement value was Regular.

variable_indexvariablecodecode_labelcountstooltipvariable_pathcode_path
6GLUCOSE (#2010)KUH|COMPONENT_ID:2010GLUCOSE (#2010)GLUCOSE (#2010) [726,600 facts; 120,404 patients]Laboratory Tests \ CHEMISTRY (KUH) \ 198-GENERAL CHEMISTRY\i2b2\Labtests\KUH\CHEMISTRY\198-GENERAL CHEMISTRY\2010: GLUCOSE,RANDOM\\i2b2\Labtests\KUH\CHEMISTRY\198-GENERAL CHEMISTRY\2010: GLUCOSE,RANDOM\
8LeukemiaSEER_SITE:35021Acute Myeloid LeukemiaAcute Myeloid Leukemia [946 facts; 946 patients]Cancer Cases \ SEER Site Summary \ Leukemia \ Myeloid and Monocytic Leukemia\i2b2\naaccr\SEER Site\Leukemia\\i2b2\naaccr\SEER Site\Leukemia\Myeloid and Monocyti\A cute Myeloid Leukem\
1VITALSKUH|FLO_MEAS_ID+LINE:205682_101. Regular01. Regular [1,734,744 facts; 111,485 patients]Flowsheets \ SAMPLE : \ Polysomnography (Complete Night/Child) #20133 \ 007- #205724 Vitals: \ 002- #205682 Heart Rhythm >\i2b2\Flowsheet\SAMPLE\SLEEP POLYSOMNOGRAPHY-COMP. NIGHT T:20133\SAMPLE G POLYSOMNOGRAPHY VITALS G:205724 I#7\\i2b2\Flowsheet\SAMPLE\SLEEP POLYSOMNOGRAPHY-COMP. NIGHT T:20133\SAMPLE G POLYSOMNOGRAPHY VITALS G:205724 I#7\KU IP ROW HEART RHYTHM M:205682 I#2\Regular C#1\

The columns are:

  • variable_index and variable correlate with the variable table.
  • code is either
    • a code located under the variable
    • a modifier that may be used with the variable
  • code_label and counts are text labels, as above
  • tooltip shows the ancestry of the code
  • variable_path corresponds to variable_index
  • code_path is either
    • a concept path starting with the variable_path
    • a modifier path
Last modified 3 years ago Last modified on 11/13/14 10:24:57