Posts for the month of January 2012

Adding SEER Site Recode to HERON Tumor Registry integration

Our HERON tuttlecreek release a couple months ago included initial integration of data on ~60,000 cancer cases from the KUMC tumor registry. We organized the NAACCR terms based on work by colleagues at the Kimmel Cancer Center in Philadelphia and Group Health Cooperative in Seattle:

NAACR terms for tumor registry

But if you want to find, for example, brain cancer cases, due to an outstanding issue (#733), you have to be an expert in codes for primary site, histology, etc.:

For our next release, based on work with John Keighley, we're providing query by SEER Site Recode, a state of the art method for combining primary site and histology:

screenshot of SEER Site Recode term hierarchy

Under the hood: Using python to convert the rules table to SQL

The SEER Site Recode ICD-O-3 (1/27/2003) Definition, lays out the rules in a fairly convenient HTML table:

Converting that table to code manually might have been straightforward, but it would have been repetitive and error-prone; so like so many Geeks and repetitive tasks, I wrote a script to automate it.

source:tumor_reg/seer_recode.py weighs in at about 200 lines, including whitespace and a handful of test cases. It reads the HTML page (well, I feed it through tidy first to clean up some table markup) and produces

  1. A term hierarchy in CSV format (source:heron_load/curated_data/seer_recode_terms.csv)
  2. Rules to recode our our ~60K cancer cases as a SQL case statement (source:heron_load/seer_recode.sql).

The resulting SQL weighs in at about 500 lines. Handling all the different kinds of rules in the table was fun; a lot more fun than writing this sort of SQL by hand:

case
/* Lip */ when (site between 'C000' and 'C009')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '20010'

...

/* Melanoma of the Skin */ when (site between 'C440' and 'C449')
  and (histology between '8720' and '8790') then '25010'

...

/* Cranial Nerves Other Nervous System */ when (site between 'C710' and 'C719')
  and (histology between '9530' and '9539') then '31040'

/* ... */ when (site between 'C700' and 'C709'
   or site between 'C720' and 'C729')
  and  not (histology between '9590' and '9989'
   or histology between '9050' and '9055'
   or histology = '9140') then '31040'

Our Executive Vice Chancellor has an amazing bird.

Another feathered friend of our informatics initiative.

KUMC's Barbara Atkinson, MD, and Buddy the "Rock Chalk"-singing parrot

AMIA 2011 Highlight: Dr. Bill Tierney's 10 year story on health care in Africa

Tierney's inspiring closing keynote was truly a highlight of #amia2011. Standing ovations for a great guy and great speaker.
-- Gunther Eysenbach, Oct 26

That's one tweet among a chorus of #amia2011 tweets about Tierney, including:

  • Death by HIPAA: shouldn't sacrifice care on altar of privacy #AMIA2011 keynote by Tierney
  • LIVE: #AMIA2011 Bill Tierney uses Clem McDonald's 1998 JAMA "Canopy Computing" paper; great metaphor for connected health data, no silos!

AMIA 2011 keynote recordings are now available:

"Dr. Tierney’s work has taken him far afield—to Kenya, Africa—to use electronic health records and to gather information from patients, applying the data to critical points in the patient–provider relationship to improve the quality and cost-effectiveness of health care. He led the effort to develop the first ambulatory electronic medical record system in sub-Saharan Africa, which has evolved into a comprehensive, open-source electronic medical record system that has been implemented in more than a dozen developing countries."

The video editing is a little rough, with quite a bit of conference administrivia at the beginning. But by the time he gets to "a Case" at 9:40, I'm sure you'll be hooked. Even if you're not an informatics geek, I'm sure you'll find the "10 year story" (starting at 37:30 into the video) inspiring.