Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#2947 closed enhancement (fixed)

search HERON labs by LOINC terminology including GPC usability design

Reported by: rwaitman Owned by: ngraham
Priority: critical Milestone: heron-cheyenne-bottoms-update
Component: data-repository Keywords: public-web
Cc: rwaitman, badagarla, mhoag, schandaka, bzschoche Blocked By:
Blocking: 3395 Sensitive: no

Description

As part of his Health Info Practicum course Belay Demeke(Masters in Health Informatics student)will be working with Russ,Sravani,Bhargav on LOINC coding at KUMC and comparison with other GPC sites.

Attachments (1)

missing_parents.py (1.6 KB) - added by ngraham 3 years ago.

Download all attachments as: .zip

Change History (61)

comment:1 Changed 4 years ago by schandaka

Meeting notes from 09/12/2014

TODO:
Walk Belay through TRAC. (BA)
Russ, Belay and SC every two weeks (BZ)
Belay and SC every week (BD)

Goals:

  1. LOINC codes in Clarity: (No HERON access needed for this)
    • Does clarity have LOINC codes?
    • Verify where LOINC code sits in Clarity and is populated.
    • Identify Which component IDs that do not have it.
  2. Bring LOINC into i2b2.
  3. Learn & Explore GPC Lab ontologies.
  4. Evaluate Microbiology & organism LOINC linkage.
  5. Evaluate Microbiology & organism SNOMED linkage.
  6. Evaluate Microbiology & organism LOINC antibiotic susceptability vs. RxNorm?.

Notes:
Get Belay access to a test server where he can upload concepts via a SQL Developer?
Find out who is the person who blesses the concepts at GPC level.

comment:2 Changed 4 years ago by schandaka

There is a column LOINC_CODE in clarity_component. However,looks like it is currently not getting populated.

select count(*) cnt from clarity.clarity_component --7421 rows

select count(*) hasnotcnt from clarity.clarity_component where LOINC_CODE is null--7382 rows

99% of the components do not have their corresponding LOINC_CODEs.

comment:3 Changed 4 years ago by rwaitman

Can you contact Belay and see if he can get us a link spreadsheet to component ID as part of his project?

comment:4 Changed 4 years ago by schandaka

From: Sravani Chandaka 
Sent: Thursday, September 18, 2014 10:06 AM
To: Belay Demeke
Subject: LOINC Codes for Components
Importance: High

Hi Belay,
Can you get us a mapping spreadsheet to map LOINC codes to the components in O2? Let me know if you need any information from me.

Thanks
Sravani 

comment:5 Changed 4 years ago by rwaitman

Belay has 2 classes this semester:
1 credit practical

  • Get laboratory tests in HERON mapped to LOINC and a usable ontology

2 credit research

  • Analyze LOINC varability across the GPC on babel
  • Document information flow from Sunquest to HL7 to Chronicles to Clarity at KUMC

Get backend access to babel (read account on the DB) to you can do the second project.

Review paper from Prakash.

Check with Nathan and Dan on where the correspondence lies on this issue with GPC-DEV

  • For the 2 credit research part, check also on who’s working creating a usable ontology. For example Potassium LOINC 2823-3 is all over the place.

o That’s secondary though to getting the basic component ids mapped.

Get the map from LOINC to componentID.

If you’re having issues here, let me know and I’ll email with Lowell Tilzer to see if he can help.

Also, you know what the names are that you send on the HL7 transaction and compare to the names in Clarity. Could be you can just match on the name since Clarity and Epic aren’t natively displaying the LOINC name, they are probably showing the name that got shipped out by the lab to be CLIA compliant or something.

Side note for Sravani: don’t worry about refreshing lab hierarchy because we want to use loinc and perhaps UMN or WISC’s hierarchy.

For cardiovascular, are they in LOINC? Yes

  • See GPC ontology for clinical measurement

comment:6 Changed 4 years ago by dconnolly

Sravani, as we discussed, please go over GroupOnly/TracTraining with Belay so that he'll know how to use the permissions I just added:

The subject bdemeke has been added to the group bmidev.

comment:7 Changed 4 years ago by dconnolly

  • Cc bdemeke added

Russ just dropped by to ask if a leader for a usable lab hierarchy has come forward in the GPC work.

I swapped in our leading sites by domain discussions and we remember that it's UMN.

See also GPC:ticket:158.

comment:8 Changed 4 years ago by schandaka

Belay and Sravani Meeting notes from 09/24/2014
Belay showed Sravani how actual HL7 messages look like in the Interface Explorer. I will give a high level summary of it and have Belay document the actual Sunquest->Clarity workflow with all the intermediate steps.

  • Belay mentioned that there are two message types for lab orders: ORM - for order placed and ORU - for order result. ORM has one segment OBR(which gives the details of the procedure ordered). ORU has two segments OBR(which gives the details of the procedure ordered) and OBX(which gives the details of the order result).
  • OBX again has all the information about the components that are part of that order. Each component section again has two sections
    1. Local code, local name and local coding system ("L")
    2. LOINC code, LOINC name and LOINC coding system("LN")
  • Sravani looked at the Local code and pointed that it is the component abbreviation(from clarity_component) using which it is mapped to LOINC. For example: Potassium has a component ID 2002 in clarity and is mapped to corresponding LOINC code 2823-3 using the abbreviation "NA". Belay mentioned that Dave Weroha has a mapping file ready and he will be requesting him that file.
  • Sravani showed Belay different flavors of Lab ontologies on Babel.Specially,UMN and GPC.
  • Sravani thinks the multiple LOINC codes for Potassium 2823-3,2821-7,2820-9,6940-1,2828-2 and 2829-0 are all having the same abbreviation code "NA". They are different by the specimen source.
LOINC Specimen SourceComponent Abbreviation
2823-3 Ser/ PlasNA
2821-7 Body fluidNA
2820-9 Dialysis FluidNA
6940-1 StoolNA
2828-2 Point in time UrineNA
2829-0 24 Hrs UrineNA

However, Belay would do some spot checks on some HL7 messages to confirm the same.

  • Sravani trained Belay on how to use trac.
  • Belay mentioned about his HERON sponsorship not yet approved.(which I think Russ worked on it this morning)

Next Steps for Belay:

  • Check that multiple LOINC codes for a component(example:Potassium) share the same component abbreviation.
  • Request Dave Weroha for the component-LOINC mapping file.
  • Also, since he contacted Dolores Boring earlier about the component-LOINC mapping file he could follow up on that request and check with her if they can bring LOINC code into Clarity.
  • Go through Prakash Nadkarni's paper on LOINC and Lab data before we meet for the next time.
  • Analyze LOINC variability across the GPC on babel.

Next Steps for Sravani:

  • Give Belay an overview of HERON and the development workflow.
  • Make sure Belay has access to Babel backend.

comment:9 Changed 4 years ago by schandaka

  • Owner changed from schandaka to bdemeke
  • Status changed from new to assigned

comment:10 Changed 4 years ago by schandaka

Russ dropped by and suggested Belay and I should get in touch with Supreet and Justin from UMN.

comment:11 Changed 4 years ago by bdemeke

I readched out Dave Weroha and he provided the necessary file.

comment:12 Changed 4 years ago by mhoag

  • Cc mhoag added

comment:13 Changed 4 years ago by schandaka

Belay will be working with Sravani on LOINC mapping the components. Work for that will be tracked in #2989.

comment:14 Changed 4 years ago by schandaka

Belay and Sravani Meeting notes from 09/30/2014

  • Sravani walked Belay through HERON, bmidev repository and version control.
  • Explained him the ORDERS data model in Clarity as he had questions about how results are stored in Clarity.
  • Walked him through epic_labs_transform.sql and epic_concepts_load.sql scripts.
  • Showed him how to browse through the source code.
  • Worked with him on the LOINC mapping piece. Sravani will finish this as Belay do not have access to id database server.

comment:15 Changed 4 years ago by schandaka

Russ,Belay and Sravani Meeting notes from 10/08/2014

Have the mapping files from the lab

Map based on abbreviation from component id to loinc

Belay works all on babel.
Stage these tables. Create the link from Component ID to LOINC

Use UMNs hierarchy for now.

  • But, preserve the KUH|COMPONENT_ID and park under the terminal node of the UMN loinc ontology

The nice thing is we don’t have to re-ETL anything because we’ll still have the component_ids in the concept code.

Create a new ontology on babel called KUMC:LOINC LABS or something

  • Copy in the UMN ontology
  • Splice in the componentID path under the parent UMN path

Once that works, we could copy back the babel path to production.

Once that’s done, for research, compare the ontologies between regular KUMC, UMN, Marshfield, Nebraska.

Find out who in the lab is clinically most responsible for interpretation of this data (ex: Rebecca Horvat leads Micro; who does chemistries).
Find a clinical researcher or two who could sit with you and get their perspective on how usable the KUMC versus UMN versus Marshfield versus GPC/Nebraska trees look

  • Ask Tamara who would be good to interview

comment:16 Changed 4 years ago by schandaka

Meeting notes from 10/20/2014
Get either babel access or a dump of the concept codes today

Practicum

  • Get the mappings deployed

o Just park the loinc codes under the current tree (Sravani has done).
o Implement the UMN tree from Babel at KUMC

  • Measure how many things don’t map.

o Count both concepts as well as facts/patients.

Research

  • Using babel, compare the different lab hierarchies for the places that do use LOINC against the “GPC: Laboratory Measurements tree”

o Note that the GPC hierarchy has redundancy

  • Devise an algorithm and measurement heuristic.

o Include comparison of the concept space but also the number of tests/patients. Note that MCRF has a slot for potassium 2823-3 but there seem to be zero facts and patients!

  • The following places can be compared:

o CMH, MCRF (odd they don’t have K), MCW, UMN, WISC, UIOWA
o Also, the i2b2 demo site has an ontology https://www.i2b2.org/webclient/

Follow up

  • Once we have our labs mapped to loinc, we could compare the KUMC Epic based tree structure with the LOINC hierarchy.
  • Note that UTSW has parked all their lab tests under the HCPCS tree and has the CPT code that was billed at the lowest level.

o Will want to follow up with UTSW to understand if they actually have the “result” or just the fact it was billed.Seems like it’s just the CPT code as the query by value doesn’t work.

Could look at code here for inspiration on navigating trees and comparing similarity
https://informatics.kumc.edu/work/browser/heron_load/flowsheet_concepts_prune.sql

https://informatics.kumc.edu/work/browser/heron_load/flowsheet_concepts_reorg.sql

comment:17 Changed 4 years ago by dconnolly

See also notes from Russ

  • Wednesday, November 12, 2014 8:40 AM

comment:18 Changed 3 years ago by bzschoche

  • Cc schandaka added; bdemeke removed
  • Keywords heron-weekly added
  • Owner changed from bdemeke to schandaka
  • Reporter changed from schandaka to rwaitman

Reassigning to Sravani as the work has transferred to her. Tagging heron-weekly to discuss in this afternoon's meeting and confirm with Russ.

comment:19 Changed 3 years ago by bzschoche

  • Cc bzschoche added

comment:20 follow-up: Changed 3 years ago by ngraham

  • Milestone set to heron-mcmillan-marsh-update
  • Type changed from task to enhancement

RW: Is GPC LOINC hierarchy ready? Did we get a mapping from the hospital to LONIC?
SC: Yes.
RW: I see 2 tasks:

  1. Our stuff tagged with LONIC
  2. The hierarchy work

RW: Are we waiting for the master GPC ontology? Or, go ahead and load something now.
DC: Nate Apathy should be delivering something.
RW: Addendum to the concept paths?
DC: Breaks old user queries.
DC: I think it's reasonable to ask Nate Apathy. I'll show Sravani the e-mail chain/etc.

comment:21 Changed 3 years ago by ngraham

  • Blocking set to 3309

comment:22 Changed 3 years ago by ngraham

  • Keywords heron-weekly removed

comment:23 in reply to: ↑ 20 Changed 3 years ago by dconnolly

Replying to ngraham:

...
DC: I think it's reasonable to ask Nate Apathy. I'll show Sravani the e-mail chain/etc.

I intend to drop by; meanwhile, some notes...

GPC:ticket:158 esp GPC:ticket:158#comment:20 in preparation for GPC:HackathonTwo

I gather the cerner folks wrote some Java code that takes a flat file from the LOINC folks and generates i2b2 metadata. I'm pretty sure Matt has successfully reproduced their results.

comment:24 Changed 3 years ago by ngraham

heron-weekly:
SC: I was hoping to push this to the next release - got the LONIC files stuff from Nathan Apathy at Cerner. I need more time to work on that.
DC: Cerner sites are using this so it's nice for us to run the same code.
RW: Is Cerner ok to share this code openly? Where did the source file come from?
DC: Aren't they using a file from Regenstrief?
MH: Don't know.
RW: Ask Nathan A. where that curated file comes from.
NG: We can talk about it next Heron weekly if it's at risk.

comment:25 Changed 3 years ago by dconnolly

Yes, the flat file comes from Regenstrief. And yes, Cerner is happy to license the code liberally.

cf gpc-dev March 17 (GPC:ticket:12)

comment:26 Changed 3 years ago by dconnolly

  • Blocking 3309 deleted
  • Milestone changed from heron-mcmillan-marsh-update to heron-cheyenne-bottoms-update

comment:27 Changed 3 years ago by ngraham

  • Blocking set to 3395

comment:28 Changed 3 years ago by schandaka

  • Keywords heron-weekly added

comment:29 follow-up: Changed 3 years ago by ngraham

heron-weekly:
SC thinks this is still possible for this release. Progress made with Nate's code.

DC: Percent of lab results covered?
SC: Not sure - I'll check and put in a ticket.

comment:30 in reply to: ↑ 29 Changed 3 years ago by ngraham

Replying to ngraham:

DC: Percent of lab results covered?
SC: Not sure - I'll check and put in a ticket.

Sravani, any update? Leaving heron-weekly for now.

comment:31 Changed 3 years ago by schandaka

Majority of top ~30 or so components have the LOINC mapping.

select cor.component_id,lm.loinccode,count(cor.component_id) from clarity.order_results cor
left join
(
select cc.component_id,cc.name,cc.abbreviation,ltc.loinccode from clarity.clarity_component cc
left join schandaka.loinctestcodes ltc on cc.abbreviation=ltc.testcode -- loincmapping components
) lm on cor.component_id=lm.component_id --- orderresults loinc mapping
group by cor.component_id,lm.loinccode
order by count(cor.component_id) desc

Components 18,8,9,10 are all Radiology related tests.

comment:32 Changed 3 years ago by ngraham

  • Keywords heron-weekly removed

comment:33 Changed 3 years ago by schandaka

LOINC work is looking good except for a minor change. As per Nathan's suggestion, I have added KUH Compoenent IDs at leaves to get the counts. For the components that are not LOINC mapped, I marked them as 'Component Not Found' in the tree. Code for that has been committed in (fb42b1fb4aef). I have showed the hierarchy on test server to Nathan. One small change needs to be done because 'Component Not Found' is showing up for all the Component folders. I need to tweak the code a little bit. I am hoping to finish that work in the next hour.

comment:34 Changed 3 years ago by schandaka

  • Owner changed from schandaka to ngraham

Nathan, I made the changes and committed in (8b14ccabe245). I checked the changes on test server and it looked good to me. Assigning this to you.

comment:35 Changed 3 years ago by dconnolly

I just thought of some use cases and I wonder if we documented them:
Does it look feasible to validate these cases for this release?

  • make query at KUMC using LOINC terms; PortQuery to babel; run at other GPC site
  • measure how much of KUMC's lab volume is mapped to LOINC
    • Yes, make sure we do this as part as validate and release
  • build CDM from KUMC HERON and run CDM/PopMedNet query including LOINC terms
    • No, we're still thinking about how this works with codes and paths; see GPC:ticket:237. Also, the current CDM ETL code doesn't handle LOINC.

reviewing goals in comment:1, it looks like these should get their own ticket(s) before we close this one:

  • Evaluate Microbiology & organism LOINC linkage.
  • Evaluate Microbiology & organism SNOMED linkage.
  • Evaluate Microbiology & organism LOINC antibiotic susceptability vs. RxNorm

Upon review, see comment:20 for reduced scope.

comment:36 Changed 3 years ago by dconnolly

It's not clear where blueherondata.loinctestcodes comes from (line 2853 from fb42b1fb4aef). Looks like an undocumented external design constraint.

  • Please document in BulkTransfer, and
  • add a comment about where the data comes from

comment:37 Changed 3 years ago by dconnolly

Sigh (i.e. this is not a request; just a regretful observation). This piles on to some known technical debt:

  • load_epic_concepts.sql is horribly un-modular; It would have been nice to move lab concepts to its own file. I started work on it last June (GPC:ticket:127#comment:5). I wonder if that branch is salvageable.
  • reaching back from de-id to id (#482) ... from clarity.clarity_component@id ...

comment:38 Changed 3 years ago by dconnolly

  • Resolution set to fixed
  • Status changed from assigned to closed

Nathan and I took a(nother) look at the UTHSCSA_loinc branch (4e7644367eb8). It's an interesting approach, but we don't think it includes the usability improvements discussed at GPC:HackathonTwo and it doesn't seem to meet any requirements this code doesn't.

This code is succinct. Automating the staging from UMLS would be nice, but given how rarely we expect to do it, it can wait.

CodeReviewNotes:

Item Ready? Detail
Use Case Clear? yes comment:35 etc.
Automated Feature Tests? no a test_heron_query.py test would be nice.
Usable by Peer Developers? yes? the design is documented in the (public) gpc-dev context; incorporating the gist of it as comments would be ideal but isn't critical.
Code Secure? pass no more/less than previous work (#482)
Performance OK? yes presumably; we're about to find out
Deployment OK? yes GroupOnly/BulkTransfer#LOINC should be elaborated in due course
Copyright and Acknowledgements? not yet let's be sure to acknowledge gpc-dev and Cerner in the release blog item
Happy to Maintain? yes close enough for rock'n'roll

merged to default in 3d39b46f97ec, to Cheyenne in 5d18f1f7d4ec

comment:39 Changed 3 years ago by schandaka

comment:40 Changed 3 years ago by dconnolly

  • Resolution fixed deleted
  • Status changed from closed to reopened

We're getting paths that our stats code (source:heron_load/concept_stats.sql#L235) can't handle; e.g. Mold Allerg Mix2 IgE Ql (15234-8)\ for LOINC:15234-8.

This query shows them:

select *
from blueheronmetadata.heron_terms ch
where length(regexp_replace(ch.c_fullname, '[^\\]', '')) < 2
;

comment:41 Changed 3 years ago by ngraham

  • Owner changed from ngraham to schandaka
  • Priority changed from major to critical
  • Status changed from reopened to assigned

It looks like we neglected to test the enhance_concepts task with the new LOINC hierarchy. Sravani, are you free to take a look this morning?

April is nearly over and this is currently blocking the release so I'm upping the priority.

comment:42 Changed 3 years ago by schandaka

Looks like some folder names are missing from the paths. Here is one example.

select *
from blueheronmetadata.heron_terms ch
where c_fullname like '%\i2b2\Laboratory Tests\Allergy\Animal\Fish\Abalone (Haliotis spp)%';

I think the issue is with the code where I changed the leaves in CMH's file to folders. I am working on it.

comment:43 Changed 3 years ago by schandaka

  • Owner changed from schandaka to ngraham

comment:44 Changed 3 years ago by schandaka

Build is still failing on all_terms_have_parents test. Assigned it to Nathan.

comment:45 Changed 3 years ago by ngraham

  • Status changed from assigned to accepted

Missing parents exist in the hierarchy generated from the CMH java code, but missing when we join the hierarchy from CMH with KUH results.

details

Likely, all_terms_have_parents should be converted to our new test methodology (#2781), but anyway...

An example path that fails the test is:

\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE Qn\Carp IgE Qn (21143-3)\

So, let's look for parents:

select c_hlevel, c_fullname, c_name
from blueheronmetadata.heron_terms where c_fullname like '\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\%'
order by c_hlevel
;

Results in

C_HLEVELC_FULLNAMEC_NAME
5\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp | Bld-Ser-Plas (LP46763-6)
6\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\Carp IgE RAST Ql (21144-1)\Carp IgE RAST Ql (21144-1)
6\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE Qn\Carp IgE Qn (21143-3)\Carp IgE Qn (21143-3)
7\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\Carp IgE RAST Ql (21144-1)\Component Not Found\Component Not Found
7\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE Qn\Carp IgE Qn (21143-3)\Component Not Found\Component Not Found

Indeed, intermediate nodes such as \i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\ are missing. Also, the following yields 0 rows:

select *
from blueheronmetadata.heron_terms 
where c_fullname = '\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\';

Does it exist in the source data? Source data meaning the hierarchy that was a product of the Java code from CMH. Yup:

select c_hlevel, c_fullname 
from blueherondata.lab_results 
where c_fullname = '\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\';

Results in:

C_HLEVELC_FULLNAME
6\i2b2\Laboratory Tests\Allergy\Animal\Fish\Carp | Bld-Ser-Plas\Carp IgE RAST Ql\

So, apparently something went wrong when we joined the hierarchy from CMH with KUH results.

Queries joining against the CMH hierarchy (lab_results) on c_basecode were taking FOREVER, but adding the following index fixed it right up:

--create index lr_loinc_basecodes_idx on blueherondata.lab_results(c_basecode);

comment:46 Changed 3 years ago by ngraham

We simplified the LOINC hierarchy code in [471d54485dca] - Sravani reviewed over my shoulder. We ran a test build with load_all_concepts and enhance_concepts tasks in build 638.

However, it failed with duplicate keys inserted into the concept dimension. It appears that there are 235 rows of the LOINC hierarchy with duplicate c_fullname (which becomes the concept_path in the concept_dimension).

It looks like there are multiple codes in the CMH hierarchy with the same path. For example:

select c_hlevel, c_fullname, c_name from  blueherondata.lab_results where c_fullname = '\i2b2\Laboratory Tests\Blood Bank\Hemolytic disease of newborn screen | Bld-Ser-Plas\HDNB Scn Bld-Imp\';

Results in:

C_HLEVELC_FULLNAMEC_NAME
4\i2b2\Laboratory Tests\Blood Bank\Hemolytic disease of newborn screen | Bld-Ser-Plas\HDNB Scn Bld-Imp\HDNB Scn Bld-Imp (49315-5)
4\i2b2\Laboratory Tests\Blood Bank\Hemolytic disease of newborn screen | Bld-Ser-Plas\HDNB Scn Bld-Imp\HDNB Scn Bld-Imp (1051-2)

Ref: 49315-5: Hemolytic disease of newborn screen (interpretation) in Blood Narrative and 1051-2 : Hemolytic disease of newborn screen (interpretation) in Blood.
One has a type "Nom" and the other has type "Nar".

As Dan pointed out, it looks like the last level in the c_fullname column should be replaced by the c_name with the code in it. That breaks the GPC paths, but the design hasn't been agreed upon - these duplicates seem to vote for putting the code in the path.

If we're going with codes in the path, I think we should make the path up of _just_ the LOINC codes. We don't think that's easy to do based on the CMH file as-is (in SQL anyway) so we're going to try replacing the last level with the name for now.

comment:47 Changed 3 years ago by ngraham

Replaced the last part of the LOINC paths with the c_name for leaf nodes in [6e81c3eae657]. Build 639 was failed with the all_terms_have_parents test again.

comment:48 Changed 3 years ago by dconnolly

  • Keywords heron-weekly added

I'd like to reconsider syncing up with UTHSCSA's approach of going from UMLS, integrating the usability aspects of the Cerner design as well. This would involve backing this feature out of Cheyenne. We would clean up our CPT hierarchy (#2138) and sync with the GPC procedure ontology while we're at it.

I took shortcuts in my code review in comment:38 and the risk didn't pay off.

  • I didn't even run (or look over the results of) a full test build (cf Can you reproduce the results, i.e. can you run integration tests? in CodeReviewNotes)
  • We left the use of Cerner's Java code as a manual part of BulkTransfer rather than checking it in, making a Jenkins job out of it, and making it part of code review. I suspect it's equivalent to a page of SQL code but none of us studied it well enough to know.

comment:49 follow-up: Changed 3 years ago by ngraham

It appears that there are 6 missing parents in the lab_results table (loaded from output of Java code). We can either work around these (insert missing parents, make exception in the test, ...) or take the approach Dan notes in comment:48 and back this out in favor of another approach.

details

The test indicates 5 paths don't have parents:

TERM_ID,"C_VISUALATTRIBUTES","C_FULLNAME","CONCEPT_CD","C_BASECODE"
23826714,"FA ","i2b2Laboratory TestsChemistryMineral, bone, joint, connective tissueCollagenCollagen crosslinked C-telopeptide,","LOINC:LP35730-8"
23826754,"FA ","i2b2Laboratory TestsChemistryMineral, bone, joint, connective tissueCollagenCollagen crosslinked N-telopeptide,","LOINC:LP18564-2"
23827083,"FA ","i2b2Laboratory TestsChemistryMineral, bone, joint, connective tissueCollagenCollagen.injectable Ab,","LOINC:LP62438-4"
23833696,"FA ","i2b2Laboratory TestsDrug/ToxHalogenated organic moleculesAcetyleneAcetylene | AirAcetylene Air-mCnc (38660-7),","LOINC:38660-7"
23833695,"FA ","i2b2Laboratory TestsDrugToxHalogenated organic moleculesAcetyleneAcetylene | Air,","LOINC:LP51651-5"

I took one example and indeed, the parent is missing:

select * from blueheronmetadata.heron_terms where c_Fullname like '\i2b2\Laboratory Tests\Chemistry\Mineral, bone, joint, connective tissue\Collagen\%' order by c_hlevel;
select * from blueheronmetadata.heron_terms where c_Fullname = '\i2b2\Laboratory Tests\Chemistry\Mineral, bone, joint, connective tissue\Collagen\'; 

Let's look at the staged LOINC ontology - indeed, they're missing from there too:

select * from blueherondata.lab_results where c_fullname like '\i2b2\Laboratory Tests\Chemistry\Mineral, bone, joint, connective tissue\Collagen\%' order by c_hlevel;
select * from blueherondata.lab_results where c_fullname = '\i2b2\Laboratory Tests\Chemistry\Mineral, bone, joint, connective tissue\Collagen\' order by c_hlevel;

Using attachment:missing_parents.py (csv from the java code used as input), it looks like the missing parents are:

\i2b2\Laboratory Tests\Drug\Tox\Halogenated organic molecules\
\i2b2\Laboratory Tests\Drug/Tox\Halogenated organic molecules\Acetylene\Acetylene | Air\
\i2b2\Laboratory Tests\Drug\Tox\Halogenated organic molecules\Acetylene\
"\i2b2\Laboratory Tests\Chemistry\Mineral, bone, joint, connective tissue\Collagen\"
\i2b2\Laboratory Tests\Drug\
\i2b2\Laboratory Tests\Drug\Tox\

comment:50 in reply to: ↑ 49 Changed 3 years ago by dconnolly

Replying to ngraham:

... We can either work around these (insert missing parents, make exception in the test, ...) or take the approach Dan notes in comment:48 and back this out in favor of another approach.

Unless you have some other pressing task to work on, don't let re-considering the other approach interrupt this work.

comment:51 Changed 3 years ago by ngraham

I used attachment:missing_parents.py to create the missing parents and inserted them in test. Build 642 was successful.

Let's consider inserting these parent paths into production data and kick off the concepts load for Cheyenne Bottoms.

Also, to simplify staging, why don't we put this hierarchy somewhere stable? Like i2b2metadata2 where the old medication hierarchy (that we load as "hidden" terms) is stored. That way, we don't have to reload the table every month.

details

python missing_parents.py lab_results.csv lab_results_missing.csv
79541 unique paths in input
Found 6 missing parents.
create table blueherondata.lab_results_missing as 
  select * from blueherondata.lab_results where 1=0;
... -- IMPORT
insert into blueherondata.lab_results
select * from blueherondata.lab_results_missing;
--6 rows inserted.

Changed 3 years ago by ngraham

comment:52 Changed 3 years ago by ngraham

heron-weekly:
RW: Put in production. Have meeting with Tamara and us for usability,etc. Then consider how we'd go with UMLS.
DC: It's a file coming from Indiana. Might be close to work to move our CPT hierarchy such that we get it from UMLS. (#2138)
NG: Having the LOINC codes in the path would be nice for future CDM work.
DC: CDM v2 has smaller number of labs.
RW: Could imagine that future iterations would require more labs.
NG: Schedule meeting with Sravani, Tamara (optional), Li (optional), Dan, Nathan, Russ to review the design/results.
DC: I don't expect user interface to change.
NG: I plan to close after scheduling the meeting.
NG: Put draft in the name?
RW: Yes.

For follow-up, see ticket:2138#comment:24

comment:53 Changed 3 years ago by ngraham

  • Keywords heron-weekly removed

comment:54 Changed 3 years ago by ngraham

After chatting with Sravani, moved the staged tables to i2b2metadata2 schema so we don't have to stage them each month (in test for now - TODO: production A/B along with missing parents and update GroupOnly/BulkTransfer):

create table i2b2metadata2.loinctestcodes as
select * from blueherondata.loinctestcodes;

create table i2b2metadata2.lab_results as
select * from blueherondata.lab_results;

create index i2b2metadata2.lr_loinc_basecodes_idx on i2b2metadata2.lab_results(c_basecode);

drop table blueherondata.loinctestcodes;
drop table blueherondata.lab_results;

Changed the code to match in [dde0c00f9117].

Added draft keyword as per comment:52 in [5a5e7898d2c8].

Test build 644.

comment:55 Changed 3 years ago by ngraham

As per comment:52, I've scheduled a meeting for May 5 and invited noted people.

comment:56 Changed 3 years ago by ngraham

While consulting Sravani, I loaded i2b2metadata2.loinctestcodes and i2b2metadata2.lab_results in production (A/B) including the missing parents. I then dropped the blueherondata versions of the tables.

comment:57 Changed 3 years ago by ngraham

  • Resolution set to fixed
  • Status changed from accepted to closed

I updated wiki:GroupOnly/BulkTransfer#UMLS. Test build noted in comment:54 was successful. Merged to default in [041d15f87f08]. Merged to Cheyenne in [c311143a2851].

comment:58 Changed 3 years ago by ngraham

  • Keywords public-web added
  • Sensitive unset

comment:59 Changed 3 years ago by mprittie

  • Sensitive set

comment:60 Changed 3 years ago by badagarla

  • Sensitive unset

We got permission during heron-weekly from team and documented it in ticket:3557#comment:7.

Note: See TracTickets for help on using tickets.