Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#2822 closed enhancement (fixed)

Prefix diagnosis hierarchy labels with ICD-9 codes to make it easier to navigate

Reported by: dconnolly Owned by: ssuman
Priority: major Milestone: heron-webster-update
Component: data-repository Keywords: public-web
Cc: ngraham, tmcmahon, mhoag, ssuman, badagarla, bhamlin Blocked By:
Blocking: 3223 Sensitive: no

Description


Attachments (1)

Diagnosis hierarchy prefixed, and sorted by ICD-9 code.docx (189.6 KB) - added by ssuman 3 years ago.

Download all attachments as: .zip

Change History (32)

comment:1 Changed 3 years ago by dconnolly

  • Owner set to ssuman
  • Status changed from new to assigned

Suman, are you interested to fix this? If not, re-assign it to somebody like me or Bhargav.

I've actually written the relevant code:

but it's on a branch that also renames ICD9:NNN to ICD9CM:NNN.

Options include:

  • integrate the idc9cm_gpc63 branch by studying and testing the impact of the rename
  • separate the prefixing code from the ICD9CM rename

comment:2 Changed 3 years ago by ssuman

ran heron_etl_tiny_no_DBA with idc9cm_gpc63

2014-07-16 10:25:42,417 ERROR heron.concept_tooltips ORA-02437: cannot validate (HERON_ETL_1.TOOLTIP_PK) - primary key violated

concept_tooltips.sql:89: 42...tooltip_pk primary key (child_id)
Traceback (most recent call last):
  File "structured_logging.py", line 184, in event
    yield sub
  File "db_util.py", line 369, in transaction_event
    yield c, tx_event
  File "db_util.py", line 356, in transaction
    yield t_e[0]
  File "db_util.py", line 801, in run_script
    save_results='DBMS_XPLAN.DISPLAY()' in se)
  File "db_util.py", line 575, in execute
    ignore_error=ignore_error, save_results=save_results)
  File "db_util.py", line 562, in _proxy_sql
    z_remainder=remainder)
EventFailure: ORA-02437: cannot validate (HERON_ETL_1.TOOLTIP_PK) - primary key violated

concept_tooltips.sql:89: 42...tooltip_pk primary key (child_id)

comment:3 Changed 3 years ago by dconnolly

  • Owner changed from ssuman to mhoag

Matt, you were in this tooltip stuff recently, yes? Do you have a quick diagnosis? It's not obvious to me what's going on.

For reference:

comment:4 Changed 3 years ago by mhoag

  • Priority changed from major to minor

Looked into it.

It appears that in 14994 cases out of ~600,000 that 2 tooltips are getting created for the same term_id. This causes the application of a primary key on term_id (a.k.a child_id in the new_tooltips table) to fail.

SQL that finds the offending duplicates

select * from
(select child_id, count(*) from heron_etl_1.new_tooltips group by child_id having count(*) > 1 order by count(*) desc) tips
join heron_etl_1.new_tooltips refl on refl.child_id = tips.child_id
order by refl.child_id;

yields:

CHILD_IDCOUNT(*)CHILD_ID_1TOOLTIPC_NAMEC_FULLNAME
36842710236842710Diagnoses \ 001-999.99 DISEASES AND INJURIES \ 990-995.99 OTHER AND UNSPECIFIED EFFECTS OF EXTERNAL CAUSES [<10 facts] \ 995 Certain adverse effects not elsewhere classified [<10 facts]995.3 Allergy, unspecified, not elsewhere classified [<10 facts]\i2b2\Diagnoses\A18090800\A8359006\A8359898\A8360921\A8358507\995.3\
36842710236842710Diagnoses \ 800-999.99 INJURY AND POISONING \ 990-995.99 OTHER AND UNSPECIFIED EFFECTS OF EXTERNAL CAUSES [<10 facts] \ 995 Certain adverse effects not elsewhere classified [<10 facts]995.3 Allergy, unspecified, not elsewhere classified [<10 facts]\i2b2\Diagnoses\A18090800\A8359006\A8359898\A8360921\A8358507\995.3\
36842711236842711Diagnoses \ 740-759.99 CONGENITAL ANOMALIES [<10 facts] \ 750 Other congenital anomalies of upper alimentary tract [<10 facts]750.5 Congenital hypertrophic pyloric stenosis\i2b2\Diagnoses\A18090800\A8359006\A8358370\A8348329\750.5\
36842711236842711Diagnoses \ 001-999.99 DISEASES AND INJURIES \ 750 Other congenital anomalies of upper alimentary tract [<10 facts]750.5 Congenital hypertrophic pyloric stenosis\i2b2\Diagnoses\A18090800\A8359006\A8358370\A8348329\750.5\

And a quick perusal shows that the offending tooltips are all in the Diagnoses Hierarchy.

Dan wanted me to take a look at this so I am changing the priority to "minor" and sticking it in my someday pile.

comment:5 Changed 3 years ago by dconnolly

  • Milestone changed from heron-verdigris-update to heron-kanopolis-update

postponing tickets not identified as Verdigris priorities in today's heron-weekly meeting

comment:6 Changed 3 years ago by dconnolly

  • Milestone changed from heron-kanopolis-update to dconnolly

Milestone renamed

comment:7 Changed 3 years ago by dconnolly

  • Milestone changed from dconnolly to heron-kanopolis-update

Milestone renamed

comment:8 Changed 3 years ago by dconnolly

  • Milestone changed from heron-kanopolis-update to dconnolly

Milestone renamed

comment:9 Changed 3 years ago by dconnolly

  • Milestone changed from dconnolly to heron-kanopolis-update

Milestone renamed

comment:10 Changed 3 years ago by dconnolly

  • Milestone changed from heron-kanopolis-update to heron-webster-update
  • Owner changed from mhoag to ngraham
  • Priority changed from minor to major

Nathan, I mentioned this while we were talking about getting the ICD9 codes out of our paths in the CDM work. Are you interested to take a crack at it soon? If not, feel free to demote back to minor.

comment:11 Changed 3 years ago by ngraham

  • Priority changed from major to minor

Minor for now.

comment:12 Changed 3 years ago by ngraham

  • Owner changed from ngraham to ssuman
  • Priority changed from minor to major

Suman to take a look at this for an intro HERON ETL task.

Meeting with Nathan, Matt or Dan to get tools working, etc. 3-4 coding chunks Dan suggests.

comment:13 Changed 3 years ago by ngraham

heron-weekly: Matt says a meeting is on the calendar for him to meet with Suman to get this going.

comment:14 follow-up: Changed 3 years ago by ssuman

[0a92aa504f15] prefix ICD9 diagnosis c_name by ICD9 code, created branch and ran heron_etl_tiny_no_DBA on branch Diagnosis_hierarchy_2822

Details-

2015-02-02 14:25:23,722 ERROR heron.concept_tooltips ORA-02437: cannot validate (HERON_ETL_1.TOOLTIP_PK) - primary key violated
concept_tooltips.sql:82: 42...tooltip_pk primary key (child_id)

user demo/XXXX does not work for i2b2 webclient on <test-app-server>...briefly discussed with Matt, to find that datasource passwd might need to be synched up for jboss deployment and perhaps restart jboss.

comment:15 in reply to: ↑ 14 Changed 3 years ago by ngraham

Replying to ssuman:

user demo/XXXX does not work for i2b2 webclient on <test-app-server>...briefly discussed with Matt, to find that datasource passwd might need to be synched up for jboss deployment and perhaps restart jboss.

Suman, Jboss wasn't running. I started it and could login and run at least a simple query.

ngraham@<dev-app-server>:~> sudo /etc/init.d/jboss start
...

Query results:

Finished Query: "Frontiers Resea@08:29:44"
[3.8 secs]
Compute Time: 1 secs
Number of patients for "Frontiers Resea@08:29:44"patient_count: 3

comment:16 Changed 3 years ago by mhoag

  • Blocking set to 3223

comment:17 Changed 3 years ago by ssuman

Duplicate records are inserted in new_tooltips, hence cannot validate (HERON_ETL_1.TOOLTIP_PK)

Only tooltip like 'Diagnoses%' are duplicated

select * from heron_etl_1.new_tooltips where child_id=2102233;

concepts_merge.sql has a case statement for c_name like 'DISEASES AND INJURIES%'. With c_name being prefixed with icd9 in umls_icd9, these is a mismatch in c_name and parent_term, child_term relationship between heron_terms and term_tree is broken.

insert into BlueHeronMetadata.heron_terms
SELECT
  C_HLEVEL,
  C_FULLNAME ,
  C_NAME ,
  C_SYNONYM_CD ,
  CASE
    WHEN c_name = 'DISEASES AND INJURIES' THEN 'FH'
    WHEN ett.icd9_code IS NOT NULL THEN 'FA'
    ELSE uicd.C_VISUALATTRIBUTES

Made appropriate updates (a828c4a2b142)
ran heron_etl_tiny_no_DBA on Friday at 5.08pm successfully, did not get tooltip error /view/HERON%20nightly/job/heron_etl_tiny_no_DBA/386/console

2015-02-05 17:12:43,987 DEBUG heron.umls_dx_concepts run(umls_dx_concepts.sql)
2015-02-05 17:12:45,609 INFO heron.umls_dx_concepts run(umls_dx_concepts.sql)
...
2015-02-05 17:15:20,731 DEBUG heron.concepts_merge run_scripts(concepts_merge)
2015-02-05 17:15:20,732 DEBUG heron.concepts_merge.concepts_merge run(concepts_merge.sql)
2015-02-05 17:17:30,885 INFO heron.concepts_merge.concepts_merge run(concepts_merge.sql)
2015-02-05 17:17:30,885 INFO heron.concepts_merge run_scripts(concepts_merge)

did not see c_name prefixed in the morning...because nightly build ran on default branch

comment:18 Changed 3 years ago by ssuman

ran jenkins job again /job/config_tiny_data/412/console, failed

  File "/usr/lib64/python2.6/ConfigParser.py", line 377, in set
    raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'i2b2_deid'
  • I had to accept new changes made by Nathan and "merge with local" on default
  • reran the job /job/heron_etl_tiny_no_DBA/391/
  • I get pm cell is down error for "http://<dev-app-server>/i2b2/webclient/"
  • <dba>@<test-app-server>:~> ps -ef|grep jboss ssuman 25234 717 0 13:11 pts/1 00:00:00 grep jboss
  • restarted jboss
  • still get error
  • look at logs
    [2/6/2015 1:22 PM] Suman Suman: 
    13:20:20,725 ERROR [edu.harvard.i2b2.ontology.dao.SchemesDao] (Thread-96) Could not get JDBC Connection;
    
  • Nathan regenerates datasource files
  • will update jboss config and will restart the service

I do see ICD9 codes prefixed to c_name in heron_terms

(304.62)Other specified drug dependence, episodic use
(304.51)Hallucinogen dependence, continuous use

Diagnosis hierarchy is prefixed,and sorted by ICD-9 code, confirmed using webclient on <test-app-server>, enclosed screen shot Diagnosis hierarchy prefixed, and sorted by ICD-9 code.docx.

comment:19 Changed 3 years ago by ssuman

  • With Matt, confirmed Diagnosis hierarchy is prefixed,and sorted by ICD-9 code, using webclient on <dev-app-server>. Matt suggested, that I should ask Tamara, how she would like this to look, since she is the customer.
  • I will speak to Tamara

comment:20 Changed 3 years ago by ssuman

  • Spoke with Tamara, she will take a look
  • ran a query on <test-app-server>
    Finished Query: "(751.4)Anomalie@11:13:26"
    [1.5 secs]
    Compute Time: 0 secs
    Patient Set for "(751.4)Anomalie@11:13:26"
    Number of patients for "(751.4)Anomalie@11:13:26"
    patient_count: 0
    

comment:21 Changed 3 years ago by tmcmahon

Any chance you can remove the parenthesis and place a space between the icd9 and title? It would be consistent with the format in the tumor registry portion of HERON.

comment:22 Changed 3 years ago by ssuman

  • Thanks Tamara, I have removed parenthesis and placed a space between the icd9 and title.
  • Checked in the code
  • reran heron_etl_no_dba /view/HERON%20nightly/job/heron_etl_tiny_no_DBA/400 /console ETL Test
  • ran a query on <test-app-server>
Finished Query: "240-246.99 DISO@08:51:31"
[5.3 secs]
Compute Time: 4 secs
Number of patients for "240-246.99 DISO@08:51:31"
patient_count: 2

select from heron_terms

5	\i2b2\Diagnoses\A18090800\A8359006\A8354357\A12966666\A16987315\A19383727\	488.8 Influenza due to novel influenza A	N	FA 				
5	\i2b2\Diagnoses\A18090800\A8359006\A8354357\A8360927\A8345617\A19384279\ 	516.6 Interstitial lung diseases of childhood	N	FA 				

comment:23 Changed 3 years ago by ssuman

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:24 Changed 3 years ago by ngraham

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:25 Changed 3 years ago by ngraham

  • Owner changed from ssuman to ngraham
  • Status changed from reopened to assigned

Next steps, code review and merge.

comment:26 Changed 3 years ago by ngraham

  • Owner changed from ngraham to ssuman

Suman,

There are two items I would like addressed before the code is merged - see details below for more explanation of these two:

  1. Use the c_fullname rather than c_name when collapsing the diagnosis hierarchy in source:heron_load/concepts_merge.sql.
  2. Add a test case in source:heron_load/concepts_merge.sql where we collapse the hierarchy to make sure that we find the c_fullname we're looking for. If such a test had existed before, we wouldn't have gotten hung up on the duplicate tooltips issue in comment:17.

Review Details

We decided to collapse the diagnosis hierarchy in ticket:1380 so I'm moving forward with the assumption that we still want it that way.

With CodeReviewNotes in mind:

  • Is the use case clear?
    • To me, yes - the description of this ticket spells out the fact that it's difficult to browse the ICD-9 hierarchy since they aren't sorted by code.
  • Is the use-case reasonably well captured in automated tests?
    • As I recall, it took us a non-trivial amount of time to figure out that the tooltips bug (comment:17) was caused by the hierarchy collapsing code in source:heron_load/concepts_merge.sql. I think there should be a 1/0 test in here that will fall over if what we're looking for to collapse the hierarchy isn't found. Otherwise, we might waste time on this again. See also my comment below about using c_fullname in a view.
    • I'm not sure how to capture the actual use case in a test - you could write code that uses regex to make sure all the ICD9 terms have c_name fields start with a number, but in my opinion it's not really worth it (and might not even work?). Doing so wouldn't really capture the use case itself as the "sorted" aspect is done by the webclient code when displaying to the user.
  • Is the code something you would be happy to maintain? (cf. WritingQualityCode)
    • Not as-is. See note about adding a test above and... We clearly have at least a dual maintenance issue (using up time with the tooltip issue for example). The dual-maintenance aspect is exemplified in [68b6862b8139] also when you changed the way the prefix looks as per Tamara's comment. You have to modify both source:heron_load/concepts_merge.sql and source:heron_load/umls_dx_concepts.sql if you want to change either. Instead, I recommend using the c_fullname for finding the "Diseases and Injuries". (\i2b2\Diagnoses\A18090800\A8359006\). By policy, we try to avoid changing c_fullname (to avoid breaking past queries) so it's less likely to change. It also makes the code simpler (get rid of substr(), rtrim(), magic number 6).


To make testing easier, maybe create a view that is has the c_fullname string in it called diagnosis_folder_to_hide or something along those lines. That way, you can use the same view for both the test noted above and for setting the folder hidden.

  • Are external design constraints clear?
    • It might be worth a note where we used the c_fullname to collapse the diagnosis hierarchy that it's made up in part with UMLS AUIs.

To double-check that it's working as-is, I ran a test build with the Diagnosis_hierarchy_2822 branch build 412(view/HERON%20nightly/job/heron_etl_tiny_no_DBA/412/)). The result looks good to me.

comment:27 follow-up: Changed 3 years ago by ssuman

  • used the c_fullname for finding the "Diseases and Injuries" in (afaeb9024b22)
  • test heron_etl_tiny_no_DBA (view/HERON%20nightly/job/heron_etl_tiny_no_DBA/414/console)

comment:28 in reply to: ↑ 27 Changed 3 years ago by ngraham

Suman,

It looks like Jenkins build 414 that you referenced in comment:27 was built using the default branch and therefore we couldn't see the new prefixes on <test-app-server>.

I made a couple of tweaks in [2778548ad8f4]. Then, I merged default into your branch in [686572faea2d] and kicked off build 415(/view/HERON%20nightly/job/heron_etl_tiny_no_DBA/415/)] but it failed:

concepts_merge.sql:187: 769...uicd.C_BASECODE
Traceback (most recent call last):
...
EventFailure: ORA-00904: "UICD"."C_BASECODE": invalid identifier

I've seen this error before - I'm not exactly sure why, but the order of aliased tables specified in the "where" clause seems to matter - at least when joining with other tables. I swapped the order in [5c4da1945560] and ran build 416 (view/HERON%20nightly/job/heron_etl_tiny_no_DBA/416/) successfully.

The results look good to me, so I vote for merging to default and closing this ticket.

comment:29 Changed 3 years ago by ssuman

  • Resolution set to fixed
  • Status changed from assigned to closed

Thank you Nathan, I have merged the branch to default [934f5b72c20a]

comment:30 Changed 3 years ago by ngraham

  • Type changed from problem to enhancement

comment:31 Changed 3 years ago by mprittie

  • Keywords public-web added
Note: See TracTickets for help on using tickets.