Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#2468 closed defect (fixed)

Some ICD9 codes with 2 digits before the dot (e.g. 41.12) missing from HERON concepts

Reported by: ngraham Owned by: ngraham
Priority: major Milestone: heron-arkansas-update
Component: data-repository Keywords: wrong-results, public-web
Cc: dconnolly, mhoag, badagarla, tmcmahon Blocked By:
Blocking: Sensitive: no


For milestone:heron-toronto-update, we found several ICD9 code facts with no concepts (see ticket:2401#comment:5).

ICD9:38.9180512013-12-23 14:23:20
ICD9:41.11216302013-12-23 14:23:20
ICD9:41.12296422013-12-23 14:23:20
ICD9:41.85160182013-12-23 14:23:20
ICD9:70.542741492013-12-26 15:53:42

Possibly related to #441.

Change History (13)

comment:1 Changed 4 years ago by ngraham

  • Owner changed from ngraham to dconnolly
  • Status changed from new to assigned

comment:2 Changed 4 years ago by dconnolly

  • Type changed from problem to defect

Looks like another leading 0s issue. Where the hospital data has 41.12, ICD9-CM from UMLS has 041.12.

Design Sketch

When forming ICD9 concept codes from hospital ICD9 codes (source:heron_load/epic_dx_concepts.sql#L103), prepend a 0 if there are fewer than 3 digits before the dot.

p.s. As noted in ticket:2401#comment:5, Epic isn't the only source for these concept codes. They're also formed from ClinicIdxSource and UHCSource.

Bonus points if any of you (Matt, Bhargav, Nathan) beat me to it.


To look at ICD9-CM, I used the subquery starting at source:heron_load/umls_dx_concepts.sql#L77 ...

 select hier.ptr
      , folder.code folder_code, folder.str folder_str
      , chd.code chd_code, chd.str chd_str, chd.aui
 from umls.mrhier hier
 join umls.mrconso folder on hier.paui = folder.aui
 join umls.mrconso chd on hier.aui = chd.aui
 where folder.sab in ('ICD9CM')
   and hier.ptr not like 'A18090800.A8352133%' -- exclude procedures

... and added and chd.code like '%41.12'. Result:

A18090800.A8359006.A8355078.A8360922.A8342781.A8340737041.1Staphylococcus infection in conditions classified elsewhere and of unspecified site041.12Methicillin resistant Staphylococcus aureus in conditions classified elsewhere and of unspecified siteA15575274

comment:3 Changed 4 years ago by dconnolly

I think seeing this issue would be valuable to our peers in the i2b2/PCORI community.

Currently, it's got detailed counts from KUH, which may be sensitive. Here's hoping for time to scrub them and un-check the sensitive flag (and edit out this paragraph) before closing this ticket.

comment:4 Changed 4 years ago by dconnolly

  • Blocking set to 2462
  • Milestone changed from heron-lovewell-update to gpc-ts-1

comment:5 Changed 4 years ago by dconnolly

  • Priority changed from major to minor

comment:6 Changed 4 years ago by dconnolly

  • Blocking 2462 deleted

(In #2462) The HERON counts are on babel, so for our site, this is done.

See gpc #1 for follow-up w.r.t. other sites.

comment:7 Changed 4 years ago by dconnolly

  • Milestone changed from gpc-ts-1 to heron-beaver-update

comment:8 Changed 4 years ago by dconnolly

  • Cc tmcmahon added
  • Owner changed from dconnolly to tmcmahon
  • Priority changed from minor to major

Tamara, would you please drop by some time to talk about the impact and priority of this ticket?

I guess one upcoming opportunity is when we close beaver (HeronLoad#milestone-close).

comment:9 Changed 4 years ago by dconnolly

  • Keywords wrong-results added
  • Milestone changed from heron-beaver-update to heron-arkansas-update
  • Owner changed from tmcmahon to ngraham

comment:10 Changed 4 years ago by ngraham

I compared ICD9 facts (problematic concepts noted in the ticket description) from this month with facts from last month for the same patient/encounter. It appears that (nearly all) facts in i2b2 that had ICD9 concepts without the leading zero last month now have the leading zeros with the latest production data. For example, facts for the same patient/encounter between the two months had concept_cd=ICD9:70.54 last month and concept_cd=ICD9:070.54 this month. Thus, the failing test that started this ticket passes with current production data.


My first step was to add a failing test case in development. However, when running the all_facts_use_known_concepts test in production (to see an example failure) after removing the exception noted for this ticket, the test passes. Note, test is in source:heron_load/concept_stats.sql#L120.

It looks like there are now only 9 facts that meet the criteria. Running the following:

select * from BlueHeronMetadata.counts_by_concept 
where concept_cd in ('ICD9:41.12', 'ICD9:70.54', 'ICD9:41.11', 'ICD9:38.9', 'ICD9:41.85');
  • In current production (A1):
    ICD9:70.54992014-02-16 05:51:53
  • In production last month (B2)
    ICD9:38.9180512014-01-16 18:41:55
    ICD9:41.11216302014-01-16 18:41:55
    ICD9:41.12296422014-01-16 18:41:55
    ICD9:41.85134172014-01-16 18:41:55
    ICD9:70.542741492014-01-21 15:52:56

Perhaps ICDs that were being mapped to an ICD9 without leading zeros last month are now being mapped to ones with leading zeros now?

Join last months production data with this months (patient and encounter) and see if the same patient/encounter maps differently. Take ICD9:70.54 as an example:

The following run in NHERON production from last month (B2):

last_month as (
  select obs.concept_cd, obs.patient_num, obs.encounter_num, pm.patient_ide, em.encounter_ide 
  from blueherondata.observation_fact@deid obs 
  join nightherondata.patient_mapping pm on pm.patient_num = obs.patient_num
  join nightherondata.encounter_mapping em on em.encounter_num = obs.encounter_num
  where obs.concept_cd = 'ICD9:70.54'
pat_enc_this_month as (
  select distinct pm.patient_num, em.encounter_num
  from last_month lm
  join nightherondata.patient_mapping@ida1 pm on pm.patient_ide = lm.patient_ide
  join nightherondata.encounter_mapping@ida1 em on em.encounter_ide = lm.encounter_ide
similar_icds as (
  select obs.concept_cd
  from blueherondata.observation_fact@deid_a1 obs
  join pat_enc_this_month pe on (pe.patient_num = obs.patient_num and pe.encounter_num = obs.encounter_num)
  where obs.concept_cd like 'ICD9:%70.54'
select count(*) from similar_icds where concept_cd = 'ICD9:070.54' -- <-- Run with both ICD9:070.54 and ICD9:70.54
; --results in 9 without leading zeros, 265 with leading zeros - adds up perfectly with query above - so, the 70.54 is getting mapped to 070.54 now.

Results in 9 without the leading zero in the final "where" clause and 265 with the leading zero. 265 + 9 = 274 which matches the total count of ICD9:70.54 in data from last month (B2) - see query at the top of this comment.

So, it would seem that facts in clarity that were coming out to be ICD9:70.54 are now coming out to be ICD9:070.54 which means we do have concepts and that's why the "known concepts" test passes with data from this current production.

comment:11 Changed 4 years ago by ngraham

  • Resolution set to fixed
  • Status changed from assigned to closed

Removed exceptions to known concepts test in [dcae987b82cb]. If we see this problem again, the known concepts test should fail to alert us.

comment:12 Changed 4 years ago by mhoag

  • Keywords public-web added

comment:13 Changed 4 years ago by ngraham

  • Sensitive unset
Note: See TracTickets for help on using tickets.