#834 closed enhancement (fixed)
query by MSDRG, length of stay, service line from UHC 1-year snapshot
Reported by: | rwaitman | Owned by: | achoudhary |
---|---|---|---|
Priority: | major | Milestone: | heron-eldorado-update |
Component: | data-repository | Keywords: | public-web |
Cc: | tmcmahon, achoudhary, badagarla, mnair, ngraham | Blocked By: | |
Blocking: | #907 | Sensitive: | no |
Description
Arvinder and Tamara to take the lead but with Bhargav and Dan looped in.
Dan to review his notes
One query might be
prevalence of a MSDRG
Traumatic stupor & coma, coma >1 hr w MCC|Traumatic stupor & coma, coma >1 hr w MCC|17|1|medical |
joined with our EMR data such as flowsheet documentation of ventilator settings, o2sat, resp rate, and death
Change History (30)
comment:1 Changed 9 years ago by
Milestone: | heron-clinton-update → heron-eldorado-update |
---|
comment:2 Changed 9 years ago by
Owner: | changed from dconnolly to achoudhary |
---|---|
Status: | new → assigned |
comment:3 Changed 9 years ago by
Sensitive: | unset |
---|
comment:4 Changed 9 years ago by
In the informatics meeting yesterday, Mani mentioned doing some exploration into the data.
Russ said that he wants to take a "rip the whole thing" approach, much like we did for NAACCR (ticket:547#comment:34).
comment:5 Changed 9 years ago by
Cc: | mnair added |
---|
comment:6 Changed 9 years ago by
Owner: | changed from achoudhary to mnair |
---|
comment:7 Changed 9 years ago by
Test data helps immensely in developing ETL code. (see #910) So we'll want the same
DB structure in the test DB.
If the staging of the UHC is separated between
- building tables
- loading public metadata (i.e. lookup tables etc.)
- loading the PHI
then we can re-use 1 and 2 in the test environment, and we only
need to manually mock up data for part 3.
comment:8 Changed 9 years ago by
Arvinder, Russ, Dan, Tamara, and Mani met today; we put some notes in UHCSource
comment:9 Changed 9 years ago by
Type: | enhancement → task |
---|
At some point, we should phrase this or some other ticket as an end-user enhancement, but for now, it's pretty clearly a task.
comment:10 Changed 9 years ago by
Mani, it sounds like you're making progress that's worth checking in. Bummer there are so
many meetings today; here's hoping we can chat about this soon.
comment:11 Changed 9 years ago by
Cc: | ngraham added |
---|
In order to get Mani, Tamara, and Nathan going on ETL, I cleaned up HeronLoad#dev-notes.
Bonus points to the first one of you who gets paver heron_load
to run in your dev environment!
I'll look for chances to help you out with it.
comment:12 Changed 9 years ago by
I added paver tasks for make_uhc_views
and load_uhc_clinical_facts
[1b71785cbc84],
though I haven't tested them. Test data for UHC (#910) would be ideal, but I suppose
production data and "B" instance would work for testing.
p.s. This morning, Tamara collected the bonus points for getting through HeronLoad#dev-notes
and getting paver heron_load
to run.
comment:13 Changed 9 years ago by
Transform and load scripts are completed and checked in. Following four concepts are loaded in this eldorado release
'UHC|Gender:'
'UHC|ServiceLine:'
'UHC|BaseMSDRG:'
'UHC|LOS:'
comment:14 Changed 9 years ago by
Blocking: | 907 added |
---|
comment:15 Changed 9 years ago by
Blocked By: | 910, 916 added |
---|---|
Type: | task → enhancement |
I suggest:
- in parallel
- point non-prod application server to bmid using
paver configure_i2b2_datasources
and such - Check that queries work, and provided they do, close this ticket as fixed.
- proceed with production ETL (#907)
- As part of validation (#906), check the use cases in the description of this ticket to see that the results make sense from a clinical point of view
- for bonus points, check the results of the i2b2 queries against the source staged data using SQL
- for double-bonus points, develop an automated test wrapper so that we can run that test again next month with the push of a button
- for bonus points, check the results of the i2b2 queries against the source staged data using SQL
As a fallback, we could test with production data using BHeronFB instead of bmid.
comment:16 Changed 9 years ago by
re Nathan's patch_for_dan_review_20120314.txt
UHC data has default values in the code as well, but put it here too...
yeah, I haven't made up my mind about the best way to keep those source codes DRY.
How about using datetime.now() in the py code if the dump file is set to None?
That would mean that the date that goes into the database no longer reliably represents the
date of the bulk transfer. If we don't need to keep that audit info, we should just get rid
of the code that handles it altogether.
It's probably worth catching the exception and producing a comprehensible diagnostic, though.
comment:17 Changed 9 years ago by
Owner: | changed from mnair to achoudhary |
---|
Arvinder, it looks like the next step is to point the non-prod application server to bmid using paver configure_i2b2_datasources
and such.
I think you and I are the only ones who have experience with that. Go ahead and set it up, OK?
Bonus points for explaining it to Nathan, Mani, etc. while you're at it.
comment:18 Changed 9 years ago by
Blocked By: | 910 removed |
---|
comment:19 Changed 9 years ago by
I got a query to work, but it's not finding any data.
Do the MRNs in the UHC test data match the Epic/Clarity? MRNs?
- Finished Query: "Gender@13:05:44"
Number of patients for "Gender@13:05:44"
ah... the problem wasn't MRNs but propagating UHC concepts to the concept_dimension.
win!
- Finished Query: "Gender@13:11:14"
[13.6 secs]
Compute Time: 11 secs
Number of patients for "Gender@13:11:14"
patient_count: 13
comment:20 Changed 9 years ago by
Blocked By: | 916 removed |
---|
comment:21 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I think Mani meant to close this ticket: ticket:916#comment:13
comment:22 Changed 9 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
oops! duplicate key in the facts. ticket:907#comment:11
Time to add a test case etc. ...
comment:23 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
it was easy enough to use recordid to ensure uniqueness. [d59a708bcedf]
comment:24 Changed 9 years ago by
After a couple tweaks[18b0096cc635], query for length of stay greater than 2 days works!
- Finished Query: "Hospital LOS [2@17:08:12" [4.1 secs]
Compute Time: 2 secs
Number of patients for "Hospital LOS [2@17:08:12"
patient_count: 12382
comment:25 Changed 9 years ago by
From the ticket description, it looks like we need to query the following for verification:
- 'UHC|Gender:'
- 'UHC|ServiceLine:'
- 'UHC|BaseMSDRG:'
- 'UHC|LOS:'
For verification, it looks like Dan has validated some of this using the non-prod application server already.
comment:26 Changed 9 years ago by
Results noted below:
Gender:
Number of patients for "Gender [27,197 @16:05:32"
patient_count: 19735
ServiceLine:
Number of patients for "Service Line [2@16:06:32"
patient_count: 19735
Base MS DRG:
Number of patients for "Base MS DRG [27@16:08:31"
patient_count: 19735
LOS:
Number of patients for "Length of Stay @16:04:00"
patient_count: 19735
comment:27 Changed 9 years ago by
Querying by Gender, LOS, etc. independently is one thing, but the use cases in the
ticket description and in UHCSource are about correlating UHC data with data from HospitalEpicSource, SSA, etc.
We did a certain amount of that in Russ's office just now, ad-hoc.
Here's hoping we manage to automate more of it in due course.
comment:28 Changed 9 years ago by
comment:29 Changed 7 years ago by
Keywords: | public-web added |
---|
The way the ETL code is currently designed, The first goal is to come
up with one or more queries that transform the source data to look like
the i2b2 observation_fact table (see figure 4 in
the i2b2 paper).
For example,
create or replace view observation_fact_bsr as ...
at source:heron_load/bsr_i2b2_transform.sql#L107
In most cases, this involves a visit transformation too to get an
encounter_ide
.(see
bsr_visit
etc. above). As long as the source data has MRNs,you can just use the MRN as the patient_ide and take advantage of the existing
code to map MRNs.
After those queries are captured as views in the source db, separate scripts
run in the identified db to load the data; in the case of BSR:
Another example is the tumor registry; the transformation and load scripts are:
I thought we had a similar situation a while ago where parallelized
development of ETL work, but reviewing those tickets (#399, #400)
shows the situation was a little different and there isn't much in
the way of notes anyway.