Ticket #1246 (closed problem: fixed)

Opened 21 months ago

Last modified 9 months ago

Approximately 2% of Medication Facts are Not Covered by our VA Med Hierarchy

Reported by: ngraham Owned by: ngraham
Priority: major Milestone: heron-walnut-update
Component: data-repository Keywords: public-web
Cc: dconnolly, rwaitman, tmcmahon Blocked By:
Blocking: Sensitive: no

Description

Summary

Around 12% of all medication facts from Clarity don't map to a medication concept in the new VA hierarchy. This means that the end-user cannot see these medications in the ontology and therefore cannot perform any queries to access 12% of the medication facts.

The ideal goal would be to match 100% of the medications found in Clarity to a location in the VA hierarchy. That way, all medications (and subsequently all medication-related facts) would be available to the end-user. 100% coverage is likely unrealistic, but we believe there are ways to improve coverage such as:

Background

As per #1048, we have implemented a medication ontology based on VA Drug Class (NDF-RT) and the RxNorm Semantic Clinical Dose Form. This approach is based on this  2010 paper on mashing up RxNorm and NDF-RT hierarchy.

Other TODOS:

  • Some of the SCDFs have very long text strings (see ticket:1048#comment:41). Right now we truncate, but there may be a better way to abbreviate).

Useful links:
 RxNav
 BioPortal NDF
 RxNorm technical documentation

Attachments

epic_med_mapping.sql Download (15.6 KB) - added by ngraham 21 months ago.
Hacked-up version of epic_med_mapping.sql used to test out NDC normalization.
med_name_extract.py Download (1.1 KB) - added by ngraham 21 months ago.
Pre-process script (just a hack for testing)
meds_with_no_rxcui.xlsx Download (29.8 KB) - added by ngraham 20 months ago.
Excel sheet with med facts that don't have an rxcui.
meds_with_no_rxcui_2.xlsx Download (38.2 KB) - added by ngraham 20 months ago.
Meds with no RxCUI after MedEx? NLP: Fixed for improper NULL handling.
results_extract.py Download (1.9 KB) - added by ngraham 20 months ago.
Post-proc script (hack for testing).
Clarity Medication Mapping.pptx Download (340.4 KB) - added by ngraham 20 months ago.
For the meeting today (few last-minute tweaks)
epic_med_mapping.2.sql Download (33.0 KB) - added by dconnolly 20 months ago.
scratchpad
clarity_under_va.png Download (13.2 KB) - added by ngraham 19 months ago.
Clarity medication directly under VA class example.
unmapped.png Download (27.9 KB) - added by ngraham 19 months ago.
Unmapped example.

Change History

comment:1 Changed 21 months ago by ngraham

  • Cc tmcmahon added

comment:2 Changed 21 months ago by dconnolly

Let's try to make it clear from ticket summaries what it means to be done.
I expect there will always be ways to "improve coverage...".

Suggestions:

  • change this to a problem ticket, with an explanation that (a) helps users understand the limitation, and (b) helps us judge the priority of fixing the problem:
    • "cannot search for [some important med] among others" or
    • "23% of Hospital med orders are not covered by our VA med hierarchy" (I made up 23%, of course)
  • make separate tickets for Medex, one for "Normalization"

Having thought through it a bit, the separate tickets approach doesn't seem worthwhile in this
case.

comment:3 Changed 21 months ago by ngraham

  • Type changed from enhancement to problem

comment:4 Changed 21 months ago by ngraham

Summary

Normalized NDCs:
Loading the normalized NDCs does give us an advantage - we match 5,595 more distinct medication ids (from 13,200 to 18,795).

However, with the 2012.06 data (Cimarron), that only gets us 11,960 more facts (though a lot more meds map to the hierarchy). This is only .01% or so :)

Attached is a hacked-up version of the epic_med_mapping.sql file I used to experiment with.

Med Ex
Spot-checking the results of scanning Clarity medication names shows seemingly good matches (eye-balling the Clarity name with the RxNorm? name).

I picked the top 3 missing medications (totally 1,393,758 facts) and checked against the results from the Med Ex - got hits for all of them (including NDFRT sources).

Med Ex NLP Details

Downloaded:
DB2828D7CFFCADB8D514AC2D1AF563A9 medex.zip from  Vanderbilt site.

Extracted the Medication_id and name fields from the clarity.clarity_medication table. Ran as per readme.

It appears to take a non-trivial amount of time (30 minutes on my PC).

Interestingly, it appears that the RxNorm? data is included in the download. I suspect it wouldn't be too swap out with newer data from RxNorm?.

Anyway, from the output, spot-checking shows seemingly good matches (eye-balling the Clarity name with the RxNorm? name). The Med Ex code does appear to reject quite a few things, which is probably good (avoiding false-positives) but I haven't inspected the code to know any details on the matching.

I picked the top 3 missing medications (totally 1,393,758 facts) and checked against the results from the Med Ex - got hits for all of them (including NDFRT sources):

CONCEPTFACTS
KUH|MEDICATION_ID:210319637501
KUH|MEDICATION_ID:210264387127
KUH|MEDICATION_ID:210767369130
select * from clarity.clarity_medication@epic where medication_id in (210319, 210264, 81928);
select * from rxnorm.rxnconso@kumc where rxcui in ('2180', '5966', '11124') and tty='PT';

NDC Details

A glance at the breakdown of values in clarity.clarity_ndc_codes and how many we can convert to the RxNorm?/HIPAA 11 digit format (using  Sample normalization code from nlm).

150,312 lines processed, 30,439 could not be converted.

Can convert

format (digits-digits...)Count
1142
5-3-260082
5-4-129141
5-4-25325
4-4-225282

Cannot convert (unknown format)

5-524438
4-65383
10618
/* for testing NDC normalization.  manually loaded CSV for testing. */
create table norm_ndc (
  MEDICATION_ID	NUMBER(18,0),
  LINE	NUMBER(38,0),
  CLARITY_NDC_CODE	VARCHAR2(15 BYTE),
  NORM_NDC_CODE	VARCHAR2(15 BYTE)
  );
  
--TOTAL number of entries (multiple NDCs for a single med id)
select count(*) from clarity.clarity_ndc_codes@epic cnc;
--150311 (same lines processed by script - note header)

--TOTAL in source table
select count(distinct cnc.medication_id) from clarity.clarity_ndc_codes@epic cnc;
--38563

--TOTAL med ids
select count(distinct nndc.medication_id) from norm_ndc nndc;
--26776

-- NON-NORMALIZED
select count(distinct nndc.medication_id) from norm_ndc nndc
join rxnorm.rxnsat@kumc rxs 
on nndc.clarity_ndc_code=rxs.atv
where rxs.atn='NDC';
-- 13200

-- NORMALIZED
select count(distinct nndc.medication_id) from norm_ndc nndc
join rxnorm.rxnsat@kumc rxs 
on nndc.norm_ndc_code=rxs.atv
where rxs.atn='NDC';
--18795

Changed 21 months ago by ngraham

Hacked-up version of epic_med_mapping.sql used to test out NDC normalization.

comment:5 Changed 21 months ago by dconnolly

How many names are left to match?

Have you see the  Approximate Match String Search in the RxNorm API? (That API is a web service.)

comment:6 Changed 21 months ago by ngraham

Summary

I've loaded the results from the MedEx NLP on both the clarity_medication.name and the clarity_medication.generic_name. The number of RxCUI matches we get looks promising (approx 800K facts are left without RxCUIs from the original 10.6M facts that weren't mapped to the hierarchy before).

Work is still in progress to determine how many facts we actually get mapped to the hierarchy using MedEx (saving now to preserve my notes).

  • We have 100,395,527 medication facts total
  • 10,636,780 facts don't yet fit in the VA hierarchy (missing NDC/GCN)
    • This is 89.4% mapped.
  • If we use MedEx on the clarity_medication.name field, we don't have RxCUIs for 1,382,749 of those missing facts. If we include the clarity_medication.generic_name the missing go down to 826,445 facts.

Details

  • As per suggestions from the MedEx authors at Vanderbilt, I pre-processed the table output from the SQL dump (clarity_medication table) to add a period and an extra new-line to each medication name.
    • I seem to always get this error, though I do get a lot of output
      MedParser_V1.exe -i input_dir -o output_dir
      
      Exception KeyError: KeyError(7520,) in <module 'threading' from '<frozen>'> ignored
      
  • Post-process result to extract "CLARITY_NAME","UMLSCUI","RXCUI".
    • MedEx appears to escape quotes with double quotes - so replace '"' with "
    • Also, remove the '.' from the name in the post-process
    • 81,040 lines
    • Max medication name length: 116 characters
  • Import into KUMC.RXNORM.clarity_name_to_rxcui_medex
    • format: csv
    • skip header
    • left/right enclosure: None
    • 81,040 rows successfully imported
  • I found that some med names have a period in them (ie "A.E.R. WITCH HAZEL 12.5-50 % TP PADM"). So, MedEx strips that out in the output file so it becomes "WITCH HAZEL 12.5-50 % TP PADM" so we can't match back to clarity names (can't match 2,472 clarity names).
    • To combat this, I also ran MedEx? on the generic_name saved in the clarity_medication table.
    • 78,634 rows imported into kumc.rxnorm.g_clarity_name_to_rx_cui_medex.
  • Preliminary results show that we now have RxCUIs for all but 1,382,749 facts (as opposed to 10,636,780 before) without generic name matching. Including generic name matching takes us to 826,445. This doesn't mean they map to the VA hierarchy yet...that's still to come...but at least we have an RxCUI

Query used for testing these matches:

--Facts we're missing now (from above): 10636780
--Total facts (from above): 100395527
with med_facts as(
  --Distinct counts by concept wrt medications
  select distinct concept_cd, facts
  from blueheronmetadata.counts_by_concept@deid cbc
  where cbc.concept_cd like 'KUH|MEDICATION_ID:%'
  ),
rxnorm_med_concepts as(
  --Distinct concepts that link to RxNorm
  select distinct(c_basecode) concept_cd
  from BLUEHERONMETADATA.rxnorm_terms@deid 
  where c_visualattributes = 'LA'
  ),
joined_med_concepts as(
  -- Find facts where we don't map to RxNorm
  select distinct mfct.concept_cd facts_concept_cd, mfct.facts, rxnc.concept_cd rxn_concept_cd
  from med_facts mfct
  left join rxnorm_med_concepts rxnc
  on mfct.concept_cd = rxnc.concept_cd
  where rxnc.concept_cd is null
  ),
medex_concepts as(
  select 
    distinct(ccm.medication_id), rxcui
  from 
    rxnorm.clarity_name_to_rxcui_medex@kumc medex
  join --skip no-matches
    clarity.clarity_medication@epic ccm
  on 
    ccm.name = medex.clarity_name
),
g_medex_concepts as(
  select 
    distinct(ccm.medication_id), rxcui
  from 
    rxnorm.g_clarity_name_to_rxcui_medex@kumc medex
  join --skip no-matches
    clarity.clarity_medication@epic ccm
  on 
    ccm.generic_name = medex.clarity_name
)
select sum(facts) from(
  select * from joined_med_concepts
  join medex_concepts
  on 'KUH|MEDICATION_ID:' || medex_concepts.medication_id = joined_med_concepts.facts_concept_cd
  join g_medex_concepts
  on 'KUH|MEDICATION_ID:' || g_medex_concepts.medication_id = joined_med_concepts.facts_concept_cd
  where medex_concepts.rxcui is null
); --1,382,749 without generic, 826,445 including generic.

Changed 21 months ago by ngraham

Pre-process script (just a hack for testing)

comment:7 Changed 20 months ago by rwaitman

Can you send me a file of what's missing with the different names and counts?

comment:8 Changed 20 months ago by ngraham

Summary

I counted incorrectly in the last comment. It appears that only 645,308 facts (370 concepts) don't have an RxCUI after using MedEx.

Attached is the requested file: concept_cd,fact_count,clarity_name,clarity_generic_name.
attachment:meds_with_no_rxcui.xlsx Download.

Next Step

I will now attempt to map all of these medications to the VA hierarchy.

Details

NOTE missing_meds table is a a table containing CONCEPT_CD,FACTS,MEDEX_RXCUI,GENERIC_MEDEX_RXCUI from the counts_by_concept table where we don't have an RxCUI from matching the name.

The error I made last night was that the table will have 2 entries when MedEx doesn't match the brand name but does match the generic name. The below query should filter for those that we can't find a match on either one.

select sum(fact_count) from(

select 
  mm1.facts_concept_cd concept_cd, mm1.facts fact_count, ccm.name clarity_name, 
  ccm.generic_name clarity_generic_name, mm1.medex_rxcui medex_rxcui, 
  mm1.generic_medex_rxcui generic_medex_rxcui  
from 
  missing_meds mm1
join 
  clarity.clarity_medication@epic ccm
on 
  'KUH|MEDICATION_ID:' || ccm.medication_id = mm1.facts_concept_cd
where
   mm1.medex_rxcui is null and mm1.generic_medex_rxcui is null
order by 
  mm1.facts
desc

); --count is: 645,308

Note that in some cases, I've seen that we have a UMLS CUI when we don't have an RX CUI. This may help us if we bring in the UMLS tables and look for synonyms.

Example:

From clarity:

concept_cdfact countclarity nameclarity generic name
KUH|MEDICATION_ID:5244281338MULTIVITAMIN PO CAPmultivitamin Cap

From Medex matching:

clarity nameUMLS CUIRxNORM CUI
MULTIVITAMIN PO CAPC0301532null

I'll attach a dump file of the inner query which has the concept_cd,fact_count,clarity_name,clarity_generic_name.

Changed 20 months ago by ngraham

Excel sheet with med facts that don't have an rxcui.

Changed 20 months ago by ngraham

Meds with no RxCUI after MedEx? NLP: Fixed for improper NULL handling.

comment:9 Changed 20 months ago by ngraham

Oops. I was improperly handling some of the null values from the output of MedEx. attachment:meds_with_no_rxcui_2.xlsx Download is what I believe to be correct now. The counts increased some but not by a huge amount (671,160 facts, 513 concepts).

Changed 20 months ago by ngraham

Post-proc script (hack for testing).

comment:10 Changed 20 months ago by ngraham

comment:11 Changed 20 months ago by ngraham

The only non-obsolete terms in RxNorm for a med-id is seemingly unrelated?

While working on medication mapping, I found some oddities in the data. In the following example, the clarity medication_id has been linked to RxNorm via GCN sequence number. Looking at the result, we can see that the only term that is not suppressed is one that is labeled as an "ingredient" (TTY=IN) and doesn't appear to be related.

Note: N=Not suppressable, O=Obsolete as per  RxNorm Documentation.

Clarity Med IDClarity NameRxCUISABTTYRxNorm NameSuppress
85352CHLORPHENIRAMINE-PHENYLEPHRINE PO ELIX19143NDDFINbeta-caroteneN
85352CHLORPHENIRAMINE-PHENYLEPHRINE PO ELIX237311NDDFCDAPHENYLEPHRINE HCL/CHLOR-MAL ORAL ELIXIRO
85352CHLORPHENIRAMINE-PHENYLEPHRINE PO ELIX237311NDDFCDCPHENYLEPHRINE HCL/CHLORPHENIRAMINE MALEATE ORAL ELIXIRO
85352CHLORPHENIRAMINE-PHENYLEPHRINE PO ELIX237311NDDFCDDPHENYLEPHRINE HCL/CHLORPHENIRAMINE MALEATE@@ORAL@ELIXIRO

The query:

select 
   ccm.medication_id, ccm.name, rxc.rxcui, rxc.sab, rxc.tty, rxc.str, rxc.suppress
from 
  clarity.clarity_medication@epic ccm
join
  clarity.rx_med_gcnseqno@epic crmg 
on 
  crmg.medication_id=ccm.medication_id
join
  rxnorm.rxnconso@kumc rxc
on
  lpad(crmg.gcn_seqno, 6, '0') = rxc.code
where 
  rxc.sab = 'NDDF' and ccm.medication_id = '85352';


comment:12 Changed 20 months ago by ngraham

I've been working on this for the past few days but I've not been able to make much improvement in the linking. Additionally, I've found some questionable data (see comment:11 for an example).

I don't believe that I'll be able to provide improvement to the end-user for this next release (milestone:heron-cedarbluff-update).

Note that isn't dependent on ETL - we could update the terms separately as the facts will be loaded with the current code.

Dustin Key (AUG member) has provided some information (SAS code) on how he mapped the medications to NDF-RT, so I plan to study that next.

comment:13 Changed 20 months ago by ngraham

Similar to comment:11, I found this:

GCNRXCUITTYSTRSUPPRESS
008039259436INDandelion RootN
0080391000990CDAOXYMETAZOLINE HCL 0.05 % NASAL SPRAY, NON-AEROSOL (ML)N
0080391000990CDCOXYMETAZOLINE HCL 0.05 % NASAL SPRAY, NON-AEROSOL (ML)N
0080391000990CDDOXYMETAZOLINE HCL@0.05 %@NASAL@SPRAY, NON-AEROSOL (ML)N

I've matched clarity medications to RxCUIs via GCN but this results in "ingredients" as well. So, as before, I get Dandelion Root as an RxCUI for the clarity name of 12 HOUR NASAL SPRAY 0.05 % NA SPRA.

comment:14 Changed 20 months ago by ngraham

I'm dumping some notes here that I've accumulated during this investigation.

Other Tables

Here is a list of tables that may help our medication mapping as per Clarity Data Dictionary:

Tables that don't exist in the full extract

RX_MED_ONE (table not found)

  • NDDF_PLUS_MEDID "The NDDF plus MED Medication ID."
  • GCN_GMED_FLAG_C "FDB GCN_SEQNO and generic MED ID assignment indicator"

RX_MED_TWO (table not found)
RX_MED_THREE (table found, but no data)

Tables to consider bringing in next extract

RX_SIMPLE_MED_IDS "This table extracts the list of the medications that are linked to simple generic medications"

RX_FRMLRY_MEDS

rx_ndc

Other Notes

We do have AHFS codes via RX_MED_AHFS: (65% of medication ids) but we don't have AHFS as a source in RxNorm?.

comment:15 Changed 20 months ago by ngraham

After using MedEx to match the missing medications to dose form via ingredient, it looks like we have 94% of our facts matched. At first glance, it appears as though many of them are vitamins, containers/bags, etc. It might be fairly easy to pick off quite a few more facts.

Note that missing_meds_20120822.csv is an extract of the missing medications. Please note that will be more than one row for a given clarity concept if we found more than one RxCUI for that concept.

Code is here [400ffc4a0684]. Probably not production ready, but the med mapping does run via paver if you take the next version [b49abf833db1] which removes some code I accidentally included in the branch.

comment:16 Changed 20 months ago by rwaitman

Great work Nathan. I'll check it out but you might skim it with Tamara who can no doubt use her Informatics/Information? Science skills to map those 12000 rows ;) Seriously, could you just run it one more time but also print out a column that shows the description of the RxCUI that was mapped?

comment:17 Changed 20 months ago by ngraham

Russ,

As per your request in comment:16, I've attached the new .csv that contains the RxNorm name (using "preferred term" where available). unmapped_meds_20120823.csv.

comment:18 Changed 20 months ago by ngraham

I've attached the presentation I've prepared for the meeting today:  Clarity Medication Mapping. I'll probably review it again before the meeting and perhaps tweak it slightly.

comment:19 Changed 20 months ago by ngraham

  • Status changed from new to accepted

comment:20 Changed 20 months ago by ngraham

I queried the UMLS list serve regarding ingredients and GCN mapping issues (comment:11 and comment:13). The reply is here:  UMLS list serve archives

John Kilbourne, MD:

The CODE value in RXNCONSO for SAB=NDDF is not a GCNSEQNO when the TTY is IN; instead it is the First Data Bank code for ingredients. To find only GCNSEQNOs, you will need to filter out TTY= IN.

Now filtering out ingredients [44f13fed6795].

comment:21 Changed 20 months ago by dconnolly

In the medex issues, I see

  • Not integrated into our ETL (“manual technical-debt”)

I'm not sure that's much of an issue, really. I look at it as more of a manual curation tool. I don't see that much value in running a fuzzy-matching tool in lights-out mode. I expect we'd just run it manually every six months or so and keep the results in source:heron_load/curated_data.

Changed 20 months ago by ngraham

For the meeting today (few last-minute tweaks)

comment:22 Changed 20 months ago by ngraham

Scribbled notes from the meeting today:

  • Look at whether or not we can map SDCF/SBDF directly to the VA class
  • Consider "helping MedEx?" out with manual string replacement (ie IJP = Injectable Product)
  • "NO HOME MEDICATIONS" is a clarity medication. Is a concept that would be useful - saying patient doesn't have documented medications.

Next Steps:

  • Highlight what MedEx? chose for the missing meds in spreadsheet
    • Give MedEx? a chance when we don't map to the hierarchy
  • Look at Multivitamin PO (152255 med id) - one GCN or multiple? Why so many RxCUIs?
    • clarity GCN always a 1:1? Or, can it be 1:many?
  • Once past these steps, Brian Barns.
  • Create "unspecified" bucket eventually.

Looking at the code with Dan:

  • Top-Down: building hierarchy first
  • Do SCDFs/SBDFs have any relationships to the VA classes? Thought no...but...
  • Consider putting modifiers in folders for medications.
  • Ask about tables in comment:14

Changed 20 months ago by dconnolly

scratchpad

comment:23 Changed 20 months ago by dconnolly

attachment:epic_med_mapping.2.sql Download is a scratchpad based on [44f13fed6795]... a form of code review, I suppose.

One of the more interesting results:

SUM(FACTS)PCT_FOUNDCOUNT(DISTINCTMEDICATION_ID)FOUND_GCNFOUND_NDCFOUND_RXCUI
7788734277.58057001111
1494234114.88351481711
50244635.00476191
25413812.53141340

comment:24 Changed 20 months ago by dconnolly

FWIW, I managed to take our blackboard drawing and render it using graphviz on MedMapping. This is perhaps more relevant to #1048, as it covers only the initial, more straightforward techniques.

comment:25 Changed 20 months ago by rwaitman

  • Milestone changed from heron-cedarbluff-update to heron-waconda-update

comment:26 Changed 19 months ago by ngraham

  • Blocking 1391 added

I talked with Russ today - he asked that I add an ontology folder for the meds that don't fit anywhere else for now. Also, he'd like to get this change into the milestone:heron-waconda-update.

This change is only to the ontology and doesn't affect the ETL.

comment:27 Changed 19 months ago by ngraham

We now get 98.6% of the medication facts somewhere in the hierarchy whereas before we had 96.4% that were accessible with therapeutic/pharmaceutical classes. Changeset [e0005ea88aec].

The following changes were made to achieve this:

  • Added medications that map directly to the VA Hierarchy but not to a semantic clinical dose and form (SCDF).
  • Added an "unmapped" bucket in the VA Hierarchy for the top 500 medications (by order frequency) that aren't mapped anywhere else.

100% of the clarity medications existed in the hierarchy before (using therapeutic/pharmaceutical classes) but 24,000 of them fell under the "No Theraputic Class Available" category. Therefore, these medications aren't selectable by the i2b2 end-user since there are so many (see related ticket:1082).

Coverage based on fact count for the milestone:heron-cedarbluff-update:

Fact Percent in VA HierarchyTotal FactsFact Percent thera/pharma * (see note)
98.5911989421796.36

Note that with the thera/pharma classification method, all medications were selectable using the "find terms" tab, but modifiers were not selectable from this interface.

Consider bringing in some of the there/pharma classes for some things we can't map to VA? Things that may be KU specific like "MISCELLANEOUS MEDICAL SUPPLIES, DEVICES, NON-DRUG [248,444 facts; 21,236 patients]"?

Clarity medication directly under VA class example.


Unmapped example.

Changed 19 months ago by ngraham

Clarity medication directly under VA class example.

Changed 19 months ago by ngraham

Unmapped example.

comment:28 Changed 19 months ago by ngraham

If I grab the top 500 medications based on fact count rather than order popularity, we get 99.2% coverage with our "other medication" folder. The counts_by_concept table isn't built until the end of the ETL. I created ticket #1411 to possibly change the design to built the counts_by_concept table earlier in the ETL.

For milestone:heron-waconda-update, we'll stick with order popularity as in comment:27 (41c44654d6a6).

comment:29 Changed 19 months ago by ngraham

  • Blocking 1373 added; 1391 removed

comment:30 Changed 19 months ago by ngraham

  • Blocking 1373 removed
  • Milestone changed from heron-waconda-update to heron-walnut-update

With the "other concepts" folder based on order popularity, we now have around 98% of the medication facts somewhere in the hierarchy. There is more work to do but this is what we have for milestone:heron-waconda-update so continue this work for the next milestone (milestone:heron-walnut-update).

comment:31 Changed 19 months ago by ngraham

As per comment:30, changing summary to reflect that we have about 2% of the facts remaining unmapped.

comment:32 Changed 19 months ago by ngraham

Russ / Tamara,

Could you comment on a criteria for closing this ticket? I think I can get 99.2% with a design change (#1411). This of course includes the "other" medication bucket.

My thoughts are:

  • Work with Dan on #1411 - try to get 99%
    • This includes the "other" medication bucket.
  • Retire the "old" medication hierarchy for milestone:heron-walnut-update.
    • Though technically 100% of med facts exist in the "old" hierarchy, many were under the "No Theraputic Class Available" category and weren't selectable by the i2b2 end-user (see comment:27) except through the search interface which does not show modifiers.
      • Excluding the "No Theraputic Class Available" category, the "old" hierarchy covered 96.4% of the facts.
    • With the new hierarchy, the medications in the "other" category are searchable and can then be selected via the i2b2 ontology (including modifiers).
  • Close this ticket and open a new ticket for investigating ways of moving more medications from the "Other" category to more specific categories.


comment:33 Changed 19 months ago by tmcmahon

Here are my thoughts...

  • Close the ticket and link to #1411.
  • Retire the old medication hierarchy in walnut.
  • I defer to Russ re: opening a new ticket.

comment:34 Changed 19 months ago by ngraham

Note: Changing the design so that counts_by_concepts is built before medication mapping (#1411) also helps with automating coverage tests (to make sure we get 99%+ of the facts covered in the hierarchy).

comment:35 Changed 19 months ago by ngraham

As per ticket:1411#comment:2, I created a branch for building the concept count earlier in the ETL. Then, I used the concept count to determine which top 500 meds go in the "other" folder [09e16cc62ef6].

As per data from milestone:heron-waconda-update, we get 99.12% fact coverage. Also added a test to make sure we have at least 99% coverage in the future.

comment:36 Changed 19 months ago by ngraham

On the same MEDS_CONCEPTS_COUNT_1411_1246 branch, I removed the old therapeutic / pharmaceutical class medication tree and put the NDF-RT tree under "Medications" in anticipation that we'll decide to deploy it this way for milestone:heron-walnut-update. See also comment:33.

[2a3182d5010e]

comment:37 Changed 19 months ago by ngraham

  • Status changed from accepted to closed
  • Resolution set to fixed

I talked with Russ - agreed that we can remove the old hierarchy, close this ticket, and create a new one to map the remaining medications in the "other" folder. Changes noted comment:36 merged to default in [a24dd707e4d1].

comment:38 Changed 9 months ago by kcrane2

Approved for public release

comment:39 Changed 9 months ago by ngraham

  • Keywords public-web added
Note: See TracTickets for help on using tickets.