wiki:UpdateValidation

We have a variety of techniques to validate the results of each HeronLoad ETL iteration:

See #419 for original work in this area.

Contents Summary for Release Notes

Review stats while drafting blog item; see also HeronReleaseNotes

For HeronReleaseNotes, a query to summarize top-level stats (with rounding...) is in source:heron_load/concept_stats.sql . e.g. for 2011/07/heron-pomona-update it produces:

CATEGORYFACTSPATIENTS
i2b2 455M
Demographics 10.7M 1.85M
HICTR Participant 5.72K 5.72K
Diagnoses 16.1M 546K
Flowsheet 357M
Labtests 38.9M 154K
Medications 23.3M 83.2K
Procedures 8.71M 505K

Counting Queries via the I2B2 Hive

From the heron_load directory, run:

python test_heron_query.py <HERON URL>

Check for errors in the output and attach the results to the validation ticket.

Toward automated performance testing

For now, see notes in validate/releast tickets, e.g. ticket:1466#comment:9

Toward Automated Browser Testing

  • Cancer Survival plugin:
    • From the kmstat\plugin_tests directory, run
      python surv_multi1.py <HERON URL>
      
    • A web browser should open to the CAS login page. Log in.
    • The browser should then open to the i2b2 client - click on "Start Query Tool"
    • The test script should then run though the automated tests in the browser - the output to the console should then show which tests passed an which failed - if all were successful, the output will be something like
      Ran 2 tests in 58.823s
      
      OK
      
    • For the above tests to work, you may need to install additional modules. From the same kmstat\plugin_tests directory, enter the following command:
      pip install -r requirements.txt
      
    • If problems with the test script are encountered, check with Dan.

Toward end-to-end verification

Example queries:

  • run in i2b2 all CPTs (384933 patients), then look at what's in KUPI.IDX_TABLE (507,149)
    • out of date? would seem we may be missing a big number of people either because our ontology is missing many codes or we have some other underlying problem.
select count(distinct hospital_medical_record)
from kupi.idx_table ki
where ki.proc_code_number is not null

  • count the number of patients with a diagnoses in i2b2 (521727 patients)
  • count the number of distinct patients in CLARITY.PAT_ENC_DX (206778 patients)
  • count the number of distinct patients in KUPI.IDX_TABLE where the ICD90 code is not null (506,892 patients)
select count(distinct hospital_medical_record)
from kupi.idx_table ki
where ki.diagnosis_number_1 is not null

select count(distinct pat_id)
from clarity.pat_enc_dx

The key areas to look at for overlap between the sources are:

  • diagnoses

For just looking at a source against itself over time, we'd expect to see more stuff (more labs, flowsheets, meds) but relative to clinic volume.

  • note that the outpatient rollout started with more volume in November
  • also CPOE went live so we could potentially see more orders, or at least they should be ordered entered by an MD as opposed to a nurse or pharmacist.

Specific areas to verify:

Last modified 5 years ago Last modified on 04/29/13 15:29:32