Posts in category training

Adding SEER Site Recode to HERON Tumor Registry integration

Our HERON tuttlecreek release a couple months ago included initial integration of data on ~60,000 cancer cases from the KUMC tumor registry. We organized the NAACCR terms based on work by colleagues at the Kimmel Cancer Center in Philadelphia and Group Health Cooperative in Seattle:

NAACR terms for tumor registry

But if you want to find, for example, brain cancer cases, due to an outstanding issue (#733), you have to be an expert in codes for primary site, histology, etc.:

For our next release, based on work with John Keighley, we're providing query by SEER Site Recode, a state of the art method for combining primary site and histology:

screenshot of SEER Site Recode term hierarchy

Under the hood: Using python to convert the rules table to SQL

The SEER Site Recode ICD-O-3 (1/27/2003) Definition, lays out the rules in a fairly convenient HTML table:

Converting that table to code manually might have been straightforward, but it would have been repetitive and error-prone; so like so many Geeks and repetitive tasks, I wrote a script to automate it.

source:tumor_reg/seer_recode.py weighs in at about 200 lines, including whitespace and a handful of test cases. It reads the HTML page (well, I feed it through tidy first to clean up some table markup) and produces

  1. A term hierarchy in CSV format (source:heron_load/curated_data/seer_recode_terms.csv)
  2. Rules to recode our our ~60K cancer cases as a SQL case statement (source:heron_load/seer_recode.sql).

The resulting SQL weighs in at about 500 lines. Handling all the different kinds of rules in the table was fun; a lot more fun than writing this sort of SQL by hand:

case
/* Lip */ when (site between 'C000' and 'C009')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '20010'

...

/* Melanoma of the Skin */ when (site between 'C440' and 'C449')
  and (histology between '8720' and '8790') then '25010'

...

/* Cranial Nerves Other Nervous System */ when (site between 'C710' and 'C719')
  and (histology between '9530' and '9539') then '31040'

/* ... */ when (site between 'C700' and 'C709'
   or site between 'C720' and 'C729')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '31040'

Automatically populating REDCap fields from earlier forms

In our work on the Alzheimers Disease Core Center, we had information entered into one REDCap form that we wanted to see in another.

REDCap doesn't offer this out of the box, so we added a little code (attachment:calc_text.patch ; #569). The way it works is a little quirky:

  1. In the usual REDCap fashion,
    1. Make a new field
    2. Choose Calculated Field for Field Type.
    3. Put the name of the source field in square brackets in the Calculation Equation. For example, [last_name]
  2. Now for the quirk: start the Variable Name for this new automatically populated field with
    • text_ for a single-line text field; for example: text_display_last_name
    • textarea_ for a multi-line text area.

OK, so using the variable name like this is sort of cheating, but hey... it seems to work for now.

If you would like us to show it to you in person, feel free to come to our FrontiersInformaticsClinic, which meets today at 4pm in Dykes 410. If today doesn't work, we're there every other Tuesday. Check the KUMC calendar.

Social Security Death Index integration expands vital statistics available in HERON

The Department of Biostatistics is excited to provide for you linkage between our clinical and administrative records and the Social Secuity Death Master File released by the National Institute of Standards and Technology.

Previously, we only had record that a patient was deceased based upon follow up at KUH/UKP or if they died while cared for at KUH/UKP (23,850 patients indicated as "Deceased" within the "Demographics" ontology). Now we have an additional indicator that the patient has died based upon records reported to the federal government (177,706 patients indicated as "Deceased per SSA" within the "Demographics" ontology).

Note, we currently are matching patients based upon an exact match of their social security number AND their date of birth.

We think this will be a powerful addition which will allow preliminary hypotheses generation regarding mortality rates between different patient cohorts. In the future we might hope to provide analysis plugins that calculate routine survival analysis.

For a neurlogical example: HERON has 1356 patients who've ever been diagnosed with Amyotrophic lateral sclerosis (ALS aka Lou Gehrig's disease) at KUMC since 2000. Of those, the hospital knew 206 had died. Now that we can also check the social security administration, we know that 821 have died.

Using i2b2 Timeline and other analysis plug-ins in HERON

We recently worked out the technical and regulatory issues to take you beyond counting to detailed analysis of the data in HERON (#314).

Consider this example from my recent Internal Medicine Research Committee Presentation on HERON:

To access these capabilities, choose to check the box to also create a "patient set" when you run your query. Then, use the analysis tools tab at the top which takes you to a series of i2b2 plugins. We have found that the "Demographics (1 Patient Set)", "Demographics (2 Patient Sets)", and "Timeline" plugins to be very illuminating.

Stay tuned for more training materials or join us at the bi-weekly informatics clinic for help.