Opened 6 years ago

Closed 3 years ago

Last modified 2 years ago

#582 closed enhancement (fixed)

identified DUA fulfillment based on original dates

Reported by: tmcmahon Owned by: dconnolly
Priority: major Milestone: heron-webster-update
Component: data-repository Keywords: DUA public-web
Cc: rwaitman, dconnolly, tmcmahon, vleonardo, mhoag Blocked By: 2650, 3107
Blocking: 2534, 2618, 2710, 2728, 2804, 2940, 3176 Sensitive: no

Description

There are two ways to go about identified data requests based on queried developed in the de-identified i2b2 service:

  • take the set of patients from the de-identified query result and get their info from the identified repository
  • re-run the query in the identified repository to get dates exactly right and such.

The former is ticket #503.
This ticket is for the latter.

Change History (66)

comment:1 Changed 5 years ago by dconnolly

  • Cc changed from rwaitman,achoudhary,dconnolly to rwaitman, achoudhary, dconnolly
  • Milestone changed from HERONv.Next to heron-neosho-update
  • Owner set to bhamlin
  • Status changed from new to assigned

comment:2 Changed 5 years ago by rwaitman

The main issue is probably getting the indexes and space correctly on night heron. Also, managing the space.

comment:3 Changed 5 years ago by rwaitman

  • Milestone changed from heron-neosho-update to heron-cow-creek-update

comment:4 Changed 5 years ago by bhamlin

  • Milestone changed from heron-cow-creek-update to heron-smoky-hill-update

Didn't get to this yet.

comment:5 Changed 4 years ago by dconnolly

  • Cc tmcmahon bhamlin added; achoudhary removed
  • Milestone changed from heron-smoky-hill-update to heron-medicine-lodge-update
  • Owner changed from bhamlin to tmcmahon
  • Type changed from enhancement to problem

Tamara,

I'd rather Brandon worked on other stuff first (e.g. #1386, #1494).

After this morning's discussion of customer engagement and training, I see this more as a problem that we're asking our users to deal with than as an enhancement they didn't know they don't have. So I'm assigning it to you to manage as a training issue, at least until we look at development planning for Medicine Lodge.

comment:6 Changed 4 years ago by mhoag

  • Milestone changed from heron-medicine-lodge-update to HERONv.Next

comment:7 Changed 4 years ago by dconnolly

  • Milestone changed from HERONv.Next to heron-solomon-update

GroupOnly/HeronForCancerCenter discussions motivate doing this sooner.

comment:8 Changed 4 years ago by dconnolly

  • Milestone changed from heron-solomon-update to heron-milford-update
  • Owner changed from tmcmahon to dconnolly

comment:9 Changed 4 years ago by dconnolly

  • Milestone changed from heron-milford-update to heron-toronto-update

Let's get settled into the new hardware (#2323) before we take on setting up indexes and such to re-run queries in the identified DB.

comment:10 Changed 4 years ago by dconnolly

  • Cc badagarla added; bhamlin removed
  • Milestone changed from heron-toronto-update to HERONv.Next
  • Owner changed from dconnolly to mnair
  • Reporter changed from dconnolly to tmcmahon

Tamara,

I'm putting you down as the reporter, since you're in a better position than I am to represent the customer.

This was not identified as a priority in HeronProjectTimeline#November2013Planning, so I'm postponing it indefinitely. Feel free to renegotiate.

The next step is to add indexes on the new hardware, so I'm assigning it to you, Mani, as proxy for the new DBA.

comment:11 Changed 4 years ago by dconnolly

  • Milestone changed from HERONv.Next to heron-kirwan-update
  • Owner changed from mnair to ssuman
  • Reporter changed from tmcmahon to dconnolly

Suman, as noted in comment:10:

The next step is to add indexes ...

Please review the history of it with myself or Nathan or Russ sometime in the next month or two.

This isn't urgent yet, but we're discussing it in the GPC project, so I'm taking it out of the someday pile and making myself the customer.

comment:12 Changed 4 years ago by ngraham

Suman,

I've built the indexes on NightHeron before for my own exploration/testing. I believe the relevant file is source:heron_load/i2b2_facts_index.sql. Normally, we just run this against BlueHeronData.

comment:13 Changed 4 years ago by ngraham

In a meeting with Dan, Tamara and Nathan, we sketched out one possible way to run queries in ID that were originally run in DEID:

  • Stand up an identified i2b2 webclient pointed to NightHeron
  • Make sure the relevant indexes are built on NightHeron (see comment:12) as part of the usual ETL/release process.
  • Create some process (Jenkins?) to transfer user queries from the QT_QUERY_MASTER table in the DEID database to the ID database (note danger of query_id collisions!).
  • Re-run the saved query on the ID database and use the R Data Builder to extract the resulting patient set.

comment:14 Changed 3 years ago by dconnolly

Suman, please schedule a design/operations meeting with Bhargav and Nathan and then postpone this to a later milestone.

comment:15 Changed 3 years ago by dconnolly

  • Blocked By set to 2650

comment:16 Changed 3 years ago by dconnolly

  • Owner changed from ssuman to dconnolly
  • Reporter changed from dconnolly to tmcmahon

I'm factoring the database-oriented stuff (#2650) out, since there's another ticket (#2649) that depends on it. I can imagine that Suman didn't get around to scheduling the design meeting we talked about because the summary of this ticket wasn't a good reminder.

I'm also putting Tamara down as the customer affected by this problem.

comment:17 Changed 3 years ago by ssuman

(with Nathan watching)
Per Dan, I ran source:heron_load/i2b2_facts_index.sql for NIGHTHERONDATA

I have scheduled a design meeting.

comment:18 follow-up: Changed 3 years ago by dconnolly

  • Milestone changed from heron-kirwan-update to heron-republican-update
  • Type changed from problem to enhancement

Now that the indexes are largely in place, a bit of design brainstorming:

  1. Honest broker runs Query and R Data builder as before in order to specify variables of interest
    1. We're not actually interested in building data at this point; perhaps R Data Builder is enhanced with a "don't bother generating data" option.
    2. The query can be aborted; likewise, we're just trying to capture the query itself, not the results.
  2. Build patient set based on identified data.
    1. Jenkins job, say dua-id-2, to run on <prod-ided-db-server> takes query lookup parameters (e.g. name, date, user_id)
    2. python tool, say id_rerun.py queries qt_query_master
    3. id_rerun.py fixes references to blue heron
      • would a simple string replacement work?
    4. id_rerun.py re-runs the edited generated SQL and captures the resulting patient set id where Jenkins can archive it.
  3. Build R Data based on identified dates etc.
    1. id_rerun.py calls rgate to run R Data Builder given the patient set and variables of interest above
      • Fortunately, I did some work to save the parameters that the user entered into the plug-in in JSON format for easy parsing and re-running.
    2. Ensure that rgate is flexible enough to work in the identified i2b2 star schema as well
  4. Run the rest of the DUA Jenkins jobs much as usual
    1. Ensure rest of the DUA Jenkins jobs (i.e. source:heron_extract R code) accommodate data where the dates have already been un-shifted.

This feels like 4 to 8 coding chunks. So I'm not likely to get it done in one release. But we'll see.

comment:19 Changed 3 years ago by ssuman

bitmap indexes are successfully created for NIGHTHERONDATA

OBS_FACT_ENC_NUM_BI
OBS_FACT_PAT_NUM_BI
OBS_FACT_CON_CODE_BI
OBS_FACT_VALTYP_CD_BI
OBS_FACT_TVAL_CHAR_BI
OBS_FACT_NVAL_NUM_BI
OBS_FACT_MOD_CODE_BI

comment:20 Changed 3 years ago by dconnolly

  • Milestone changed from heron-republican-update to heron-marmaton-update

Tamara, in the Tuesday GPC software planning meeting with Russ, he made it pretty clear that my priority is #2418.

Let us know it today's meeting if postponing this isn't OK.

comment:21 Changed 3 years ago by dconnolly

  • Milestone changed from heron-marmaton-update to heron-hackberry-update

I don't see any/many more coding chunks in Marmaton.

comment:22 Changed 3 years ago by mhoag

  • Blocking set to 2710

(In #2710) The root cause of this issue is that none of the id only facts (e.g. zip code?) or values (e.g. real age - When over 90) are available when we run the id-eav.sql.

Specifically id-eav.sql does not supply you with an identified dataset, but instead supplies you with a re-identified dataset. In a practical sense, this distinction is important for two reasons:

  1. A re-identified dataset only uniformally re-identifies de-id dates and date-facts (date shifts), it does not re-identify any other types of fact like date-deltas (current age).
  2. A re-identified dataset will not contain any id-only facts that do not exist in the patient dimension (e.g. zip code?)

The real solution to this issue is implementing #582.

After speaking with Tamara, the work around for this particular problem is education, instruct the users of the data to ignore the age and use the birth_date instead. I also spoke with Vincent and he said that age being correct was not critical to him (he does not use it) and that his top priority is some implementation of #582 preferably via #2741

Further, Tamara mentioned that while the age fact was not useful as raw data, it did help paint a pretty visualization when the age data was summarized in Redcap.

So, since there is an acceptable work-around and this problem should be fixed by #582 I am demoting it to minor.

comment:23 Changed 3 years ago by dconnolly

  • Blocked By changed from 2650 to 2650, 2741
  • Cc vleonardo added; badagarla removed
  • Owner changed from dconnolly to vleonardo

scheduled for Oct in HeronProjectTimeline#Aug2014Planning.

comment:24 Changed 3 years ago by vleonardo

  • Blocking changed from 2710 to 2710, 2804

comment:25 Changed 3 years ago by vleonardo

  • Blocking changed from 2710, 2804 to 2710

comment:26 Changed 3 years ago by vleonardo

  • Blocking changed from 2710 to 2710, 2804

comment:27 Changed 3 years ago by dconnolly

  • Milestone changed from heron-wakarusa-update to heron-saline-update

scheduled for Oct in HeronProjectTimeline#Aug2014Planning.

comment:28 Changed 3 years ago by dconnolly

  • Blocked By changed from 2650, 2741 to 2650, 2741, 2940

comment:29 Changed 3 years ago by vleonardo

  • Blocked By changed from 2650, 2741, 2940 to 2352, 2650, 2741, 2940

This ticket is related to #2352 and might not even be required based on the solution of #2352. Also, given the re-engineering of the DUA process, this ticket is being demoted to minor.

comment:30 Changed 3 years ago by vleonardo

  • Priority changed from major to minor

comment:31 Changed 3 years ago by dconnolly

  • Milestone changed from heron-saline-update to heron-verdigris-update
  • Owner changed from vleonardo to ngraham
  • Priority changed from minor to major

In today's weekly HERON planning meeting, we discussed use cases for identified i2b2. This seems important enough to Tamara that it shouldn't be postponed without discussion, though she accepts that Saline is aimed at infrastructure prerequisite to this feature; for now, she'll have to re-build queries by hand to move them from the main de-id HERON to identified i2b2.

Nathan, would you like to investigate the feasibility of automatically moving queries as sketched in comment:18? I'm not talking about writing any code; just manually simulate what the code might do.

comment:32 Changed 3 years ago by ngraham

  • Milestone changed from heron-verdigris-update to heron-saline-update
  • Owner changed from ngraham to vleonardo
  • Priority changed from major to minor

Batch update from file HERON_saline_meeting_todos.csv

comment:33 Changed 3 years ago by dconnolly

  • Milestone changed from heron-verdigris-update to heron-kanopolis-update

postponing tickets other than those identified as Verdigris priorities in today's heron-weekly meeting

comment:34 Changed 3 years ago by dconnolly

The design notes in ticket:2940#comment:44 and ticket:2940#comment:45 are about using the data builder based on queries run on the identified i2b2, which is this ticket.

So the id_deid_cdw_2940 branch should have been called id_deid_cdw_582. Hmm.

comment:35 in reply to: ↑ 18 Changed 3 years ago by dconnolly

  • Cc mhoag added
  • Owner changed from vleonardo to dconnolly
  • Priority changed from minor to major
  • Status changed from assigned to accepted

Replying to dconnolly:

Now that the indexes are largely in place, a bit of design brainstorming:

  1. Honest broker runs Query and R Data builder as before ...
  2. Build patient set based on identified data.
    1. Jenkins job, say dua-id-2 ...
    2. python tool, say id_rerun.py queries qt_query_master
    3. id_rerun.py fixes references to blue heron
      • would a simple string replacement work?

Now that we have identified i2b2, it's much easier than that: we just use a webclient UI plug-in to port the XML query definition. From [ee0929ba0076/portaquery]:


Port Query

  1. In the other i2b2:
    1. Run a query.
      • Make sure debug mode is on: look for the Show XML Message Stack button: <>, especially in the top right Query Tool section.
    2. Open the XML Message Stack and find the runQueryInstance_fromQueryDefinition message for your query (recent messages are at the bottom of the list)
    3. Copy the query_definition element, including its tags.
  2. Then, in this i2b2,
    1. Paste below.
    2. Hit Load Query below.
    3. Select the Find Patients tab to see your query.

It's deployed on our identified i2b2 webclient. I went over it with Tamara. The usability of picking out the "<query_definition>" element is not great, but it works.

Next up: deploy data builder there.

comment:36 Changed 3 years ago by vleonardo

  • Blocking changed from 2710, 2804 to 2618, 2710, 2804

comment:37 Changed 3 years ago by mhoag

Factoring out the system administration (deployment) piece to #3107

comment:38 Changed 3 years ago by dconnolly

  • Blocked By changed from 2352, 2650, 2741, 2940 to 2650, 2741, 2939
  • Blocking changed from 2618, 2710, 2804 to 2618, 2710, 2804, 2940

sorting out dependencies a bit

comment:39 Changed 3 years ago by dconnolly

  • Blocking changed from 2618, 2710, 2804, 2940 to 2534, 2618, 2710, 2804, 2940

comment:40 Changed 3 years ago by dconnolly

  • Blocking changed from 2534, 2618, 2710, 2804, 2940 to 2534, 2618, 2710, 2728, 2804, 2940

comment:41 Changed 3 years ago by dconnolly

  • Blocked By changed from 2650, 2741, 2939 to 2650

updating dependencies w.r.t. the change of plans in ticket:2940#comment:48

The dependencies on #2741 and #2939 are cluttering up the Kanopolis plan.

comment:42 Changed 3 years ago by dconnolly

  • Blocked By changed from 2650 to 2650, 3107

comment:43 Changed 3 years ago by dconnolly

  • Resolution set to fixed
  • Status changed from accepted to closed

Testing of #3107 seems to show that this is done.

I lost track of where we were, but on review, I don't see anything that's left to do.

comment:44 Changed 3 years ago by dconnolly

  • Milestone changed from heron-kanopolis-update to dconnolly

Milestone renamed

comment:45 Changed 3 years ago by dconnolly

  • Milestone changed from dconnolly to heron-kanopolis-update

Milestone renamed

comment:46 Changed 3 years ago by dconnolly

  • Keywords heron-weekly DUA added

Tamara, we made an "Identfied DUA" yellow sticky this morning; that suggests this isn't done, but I'm not aware of anything that's left to do.

I hope to get clarification in heron-weekly.

comment:47 Changed 3 years ago by ngraham

  • Resolution fixed deleted
  • Status changed from closed to reopened

Matt notes we don't supply MRNs.

comment:48 Changed 3 years ago by dconnolly

  • Milestone changed from heron-kanopolis-update to heron-webster-update
  • Owner changed from dconnolly to mhoag
  • Status changed from reopened to assigned

comment:49 Changed 3 years ago by dconnolly

  • Keywords heron-weekly removed

comment:50 Changed 3 years ago by mhoag

  • Status changed from assigned to accepted

Frontburner for me. Vince indicated that having the MRN in identified data sets is really more important than getting the databuilder to run more quickly (#2365). Maybe Vince should be the customer on this ticket?

comment:51 Changed 3 years ago by mhoag

  • Blocking changed from 2534, 2618, 2710, 2728, 2804, 2940 to 2534, 2618, 2710, 2728, 2804, 2940, 3222

comment:52 Changed 3 years ago by mhoag

Nathan reviewed changes to the NightHeronData.PATIENT_DIMENSION in changeset 862131829305 (added MRN column to the table), making no objects for merging to default.

Ran test etl (/job/heron_etl_tiny_no_DBA/404) with the branch and manually validated that the column was added:

select case when (count(*) > 0) then 'MRN exists' else 'MRN does not exist' end
from all_tab_columns 
where table_name = 'PATIENT_DIMENSION'
  and owner = 'NIGHTHERONDATA'
  and column_name = 'MRN';
CASEWHEN(COUNT(*)>0)THEN'MRNEXISTS'ELSE'MRNDOESNOTEXIST'END
MRN exists

comment:53 Changed 3 years ago by mhoag

  • Blocking changed from 2534, 2618, 2710, 2728, 2804, 2940, 3222 to 2534, 2618, 2710, 2728, 2804, 2940, 3176

Merged to default in ebadb6680d2e. Unblocking ETL (As this as the only piece that needed to be delivered before ETL was kicked off) and blocking validate and release.

comment:54 Changed 3 years ago by mhoag

  • Owner changed from mhoag to dconnolly
  • Status changed from accepted to assigned

Feature was implemented on feature branch id_pat_dim_582 (feature branch diff) alongside feature work from ticket:2728#comment:42.

Support for adding the mrn can be found in all of the changesets (022e1a0ded1b/heron_extract 0868db836811/heron_extract 0ff2f5498193/heron_extract c43b1f22fc88/heron_extract) with 16bb7bf41731/heron_extract demonstrating the use case in test.

Note that I opted for an architectural change when dealing with SQLAlchemy's metadata. Basically I encouraged hiding the global metadata so it doesn't get modified while the program is running, instead using a small API to get a copy of the metadata c43b1f22fc88/heron_extract. This made testing an ID Patient Dimension alongside a DEID Patient Dimension possible, among other things (less reliance on global mutable state).

Assigning to Dan for review.

comment:55 Changed 3 years ago by mhoag

  • Resolution set to fixed
  • Status changed from assigned to closed

Dan merged to default on 7afc445fc6ab/heron_extract. Closing.

comment:56 Changed 3 years ago by mhoag

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:57 Changed 3 years ago by mhoag

Turns out there is one instance in clarity where an MRN is actually not a number.

-- ack. http://stackoverflow.com/questions/5082176/check-if-its-a-number-function-in-oracle

CREATE OR REPLACE FUNCTION is_number( p_str IN VARCHAR2 )
  RETURN VARCHAR2 DETERMINISTIC PARALLEL_ENABLE
IS
  l_num NUMBER;
BEGIN
  l_num := to_number( p_str );
  RETURN 'Y';
EXCEPTION
  WHEN value_error THEN
    RETURN 'N';
END is_number;

----

select count(*) from patient_dimension pd where IS_NUMBER(pd.mrn) = 'N'; 

yields

COUNT(*)IS_NUMBER(PD.MRN)
2095905Y
1N

where the "number" in the one case is 0XXXXXO (notice that trailing zero is actually the letter "O"). Boo.

comment:58 Changed 3 years ago by mhoag

There seems to me to be three options:

1) Notify the hospital about their bad MRN
2) Add logic to our ETL to handle bad string-like MRNs
3) Assume that MRNS can be Strings

After speaking with Nathan I am leaning towards option 3. MRNs are identifiers and and even "bad" ones could have some limited potential use. Cleaning them out (as would be the likely implementation in option 2) would not gain us anything except the solace that all our MRNs are actually numbers. Lastly, option 1 would only have temporary benefit since this same problem would occur again if a bad MRN was inserted again on the Hospital side.

Going with option 3 for now.

comment:59 Changed 3 years ago by dconnolly

Note also #2385 "preserving leading 0s in MRNs is tedious".

comment:60 follow-ups: Changed 3 years ago by mhoag

Option 3 implemented in 066e851c5d3a merged to webster on 46a41dccab4a (as good a default during release time). Going to scribble down a few more notes regarding comment:59 then close this ticket.

comment:61 in reply to: ↑ 60 Changed 3 years ago by mhoag

Replying to mhoag:

Option 3 implemented in 066e851c5d3a merged to webster on 46a41dccab4a (as good a default during release time). Going to scribble down a few more notes regarding comment:59 then close this ticket.

Also realized I should reflect this change in source:heron_extract/cdr2edc/i2b2_star.py before closing as well.

comment:62 Changed 3 years ago by mhoag

Implemented the corresponding change to 066e851c5d3a in f401bda4aef9/heron_extract. Some tests needed to be updated to reflect the change.

comment:63 in reply to: ↑ 60 Changed 3 years ago by mhoag

Replying to mhoag:

Option 3 implemented in 066e851c5d3a merged to webster on 46a41dccab4a (as good a default during release time). Going to scribble down a few more notes regarding comment:59 then close this ticket.

My main issue with treating MRNs as string is that mapping can be difficult across source systems. #2385 is an exemplar of that fact. However since we have dealt with it up to this point and our source systems appear to treat MRN as a free text instead of a Number it seems like we should stay the course. Maybe if we get bit by this too many times in the future we will change our mind.

comment:64 Changed 3 years ago by mhoag

Had to change the requirements.txt (6925df7a9a72/heron_extract) so that we get the correct version of PyCap for source:heron_extract/cdr2edc/redcap_upload.py to work.

comment:65 Changed 3 years ago by mhoag

  • Resolution set to fixed
  • Status changed from reopened to closed

changes noted in comment:60, comment:62, and comment:64 were merged to default on 3d1a2cb43c3d/heron_extract. Closing.

comment:66 Changed 2 years ago by mprittie

  • Keywords public-web added
Note: See TracTickets for help on using tickets.