Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#2607 closed defect (fixed)

few_missing_dx_ids fails due to multiple values in current_icd9_list column

Reported by: ngraham Owned by: mhoag
Priority: major Milestone: heron-arkansas-update
Component: data-repository Keywords: public-web
Cc: mhoag, dconnolly, tmcmahon Blocked By:
Blocking: #2592 Sensitive: no


It appears that the clarity_edg table finally made good on its promise (from the Clarity2010 upgrade) to insert comma separated values into the current_icd9_list column.

Dan, reporting for Nathan, based ticket:2592#comment:5

Change History (6)

comment:1 Changed 7 years ago by dconnolly

Owner: changed from dconnolly to ngraham
Status: newassigned

Yes, the relationship between DX_ID and ICD9 code is evidently one-to-many.

I suppose I'd split the ICD9 data much like we did in source:heron_load/epic_flowsheets_multiselect.sql:

make one row with a;b;c into 3 rows with
  a;b;c a
  a;b;c b
  a;b;c c

Nathan, would you please take a crack at it?

I'd like to, but I'd be inclined to tackle the rest of GPC:ticket:92 while I'm at it.

comment:2 Changed 7 years ago by ngraham

Owner: changed from ngraham to mhoag

Matt was nice enough to tackle this for me since I had meetings scheduled much of the day...notes to follow...

comment:3 in reply to:  1 Changed 7 years ago by mhoag

Replying to dconnolly:

I suppose I'd split the ICD9 data much like we did in source:heron_load/epic_flowsheets_multiselect.sql ...

I exploded the clarity_edg table as suggested so that there is 1-to-1 relationship between dx_ids and ICD9 codes per row. Changes in 87592f820077/heron_load.

I considered making a separate task for generating this view - under the assumption that it may be used multiple times, but it appears to only be needed for concept building in epic_dx_concepts.sql (specifically, current_icd9_list's only other reference is in epic_diag_tx.sql#L24 but this reference works fine if the value of current_icd9_list happens to be a csv string).

comment:4 Changed 7 years ago by ngraham

To test the new code, we moved the new clarity_edg over to <old ID server> (source data for test build). Export clarity_edg from production and put it on our test server:

  • Backup the old version:
    ngraham@ID_OLD:/var/lib/jenkins/deid_oracle_export> ORACLE_SID=nheron1 expdp DIRECTORY=DEID_ORACLE_EXPORT DUMPFILE=clarity_edg_backup.dmp LOGFILE=clarity_edg_backup.log tables=clarity.clarity_edg
    . . exported "CLARITY"."CLARITY_EDG"                     50.31 MB  323264 rows
    Job "NGRAHAM"."SYS_EXPORT_TABLE_01" successfully completed at 15:00:56
  • Export from production:
    ngraham@ID_NEW:/d1/kuh_export> ORACLE_SID=<id_b2> expdp DIRECTORY=CLARITY_IMPORT DUMPFILE=clarity_edg.dmp LOGFILE=clarity_edg.log tables=clarity.clarity_edg
    . . exported "CLARITY"."CLARITY_EDG"                     153.7 MB  920015 rows
    Job "NGRAHAM"."SYS_EXPORT_TABLE_01" successfully completed at 15:05:23
  • Move to <old ID server>:
    ngraham@ID_NEW:/d1/kuh_export/data_pump_dir> scp clarity_edg.dmp <old ID server>:
    ngraham@ID_OLD:/var/lib/jenkins/deid_oracle_export> mv ~/clarity_edg.dmp ./
  • Import:
    ngraham@ID_OLD:/var/lib/jenkins/deid_oracle_export> ORACLE_SID=nheron1 impdp DIRECTORY=DEID_ORACLE_EXPORT DUMPFILE=clarity_edg.dmp LOGFILE=clarity_edg_imp.log
    ORA-39171: Job is experiencing a resumable wait.
    ORA-01653: unable to extend table CLARITY.CLARITY_EDG by 128 in tablespace EPICSMALL


Couldn't extent the tablespace, but we've got a lot free in "USERS" so let's just remap:

ngraham@ID_OLD:/var/lib/jenkins/deid_oracle_export> #ORACLE_SID=nheron1 impdp DIRECTORY=DEID_ORACLE_EXPORT DUMPFILE=clarity_edg.dmp LOGFILE=clarity_edg_imp.log  REMAP_TABLESPACE=EPICSMALL:USERS
. . imported "CLARITY"."CLARITY_EDG"                     153.7 MB  920015 rows
Job "NGRAHAM"."SYS_IMPORT_FULL_01" successfully completed at 15:46:20

Showing free space in USERS:

select b.tablespace_name, tbs_size SizeMb, a.free_space FreeMb
(select tablespace_name, round(sum(bytes)/1024/1024 ,2) as free_space 
from dba_free_space group by tablespace_name) a, 
(select tablespace_name, sum(bytes)/1024/1024 as tbs_size 
from dba_data_files group by tablespace_name
select tablespace_name, sum(bytes)/1024/1024 tbs_size
from dba_temp_files
group by tablespace_name ) b
where a.tablespace_name(+)=b.tablespace_name and b.tablespace_name = 'USERS';

comment:5 Changed 7 years ago by mhoag

Resolution: fixed
Status: assignedclosed

Ran into an additional issue with a validation test (dx_with_children_are_fa) expecting 5 dx_ids to have exactly 1 parent. Since the spirit of the test did not seem to care how many parents the dx_ids had (only that they were visually leafs instead of folder). I modified the test to disregard the number of parents the dx_ids had de593bdaedf1/heron_load.

Also fixed another minor annoyance where clean up failed on the epic_dx_terms table if any of the validation tests failed (e2cbe3121fce/heron_load and bc6d39be3c92/heron_load).

[trunk_test_etl_anybranch/191/ Test ETL Passed]

I believe this is ready for merge (Nathan reviewed over my shoulder).

comment:6 Changed 7 years ago by mhoag

Keywords: public-web added
Note: See TracTickets for help on using tickets.