Changes between Initial Version and Version 8 of TumorRegistry

Remember: No patient names, identifiers, or other PHI


Ignore:
Timestamp:
Oct 8, 2014 4:01:50 PM (7 years ago)
Author:
dconnolly
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TumorRegistry

    v1 v8  
     1[[PageOutline]]
     2
     3[[HERON]] integrates data from the KUH Tumor Registry, with 65,000 cases dating back to the 1950s.
     4
     5== Accomplishments ==
     6
     7''newest first...''
     8
     9[[BlogList(category=NAACCR)]]
     10
     11
     12
     13== Source code and development notes for NAACCR ETL == #naaccr-etl-dev
     14
     15The data comes from the KUH Tumor Registry in the [http://www.naaccr.org/StandardsandRegistryOperations/VolumeIIArchive.aspx NAACCR format]:
     16
     17 * Thornton M, (ed). [http://www.naaccr.org/LinkClick.aspx?fileticket=LJJNRVo4lT4%3d&tabid=133&mid=473 DATA STANDARDS AND DATA DICTIONARY Standards for Cancer Registries Volume II: Data Standards and Data Dictionary], Record Layout Version 12.1, 15th ed. Springfield, Ill.: North American Association of Central Cancer Registries, June 2010.
     18
     19''todo: update to [http://www.naaccr.org/StandardsandRegistryOperations/VolumeII.aspx current data dictionary matierals] (v13 as of this writing), which has relational data and [https://github.com/naaccr/vol2_dd_export python export tools].''
     20
     21Our NAACCR ETL SQL scripts are designed for use in the HeronLoad ETL process (see also source:heron_load/README.rst).
     22
     23We greatfully acknowledge contributions from Dustin Key of GHC and Jack London of the Kimmel Cancer Center.
     24
     25The code is not (yet) designed to run independent of the KUMC environment, but peers in the informatics community have managed to port these scripts to their environment:
     26
     27 - source:heron_load/metadata_init.sql
     28   - TODO: fix "see also: naacr_init.sql"
     29 - source:heron_load/naaccr_txform.sql
     30 - source:heron_load/seer_recode.sql
     31   - source:heron_load/curated_data/seer_recode_terms.csv
     32   - source:heron_load/curated_data/NOTICE
     33 - source:heron_load/naaccr_load.sql
     34
     35''not yet released; stay tuned (#1254)''
     36
     37 - source:heron_staging/tumor_reg: convert NAACCR specification to SQL view and Oracle sqloader control file
     38
     39Design notes include:
     40
     41[[TicketQuery(id=547|1803|1835|1632|782|1804|2112|863)]]
     42
     43We reviewed the data we get by section to eliminate potentially sensitive data, including free-text; the sections with a `--` below are not loaded into HERON:
     44
     45{{{
     46167     and ns.SectionID in (
     47168       1 -- Cancer Identification
     48169      , 2 -- Demographic
     49170     -- , 3 -- Edit Overrides/Conversion History/System Admin
     50171      , 4 -- Follow-up/Recurrence/Death
     51172     -- , 5 -- Hospital-Confidential
     52173      , 6 -- Hospital-Specific
     53174     -- , 7 -- Other-Confidential
     54175     -- , 8 -- Patient-Confidential
     55176     -- , 9 -- Record ID
     56177     -- , 10 -- Special Use
     57178       11 -- Stage/Prognostic Factors -- TODO: numeric stuff
     58179     -- , 12 -- Text-Diagnosis
     59180     -- , 13 -- Text-Miscellaneous
     60181     -- , 14 -- Text-Treatment
     61182     -- , 15 -- Treatment-1st Course
     62183     , 16 -- Treatment-Subsequent & Other
     63184     , 17 -- Pathology
     64185     )
     65}}}
     66 -- source:heron_load/naaccr_txform.sql#L67
     67
     68== Requirements Gathering ==
     69
     70December 1, 2011 Planning
     71
     72Meeting with Tim Metcalf, Russ, Arvinder, Bhargav
     73
     74Where exactly is the "RX Summary info?"  should be after the RX-Summ
     75
     76Longer term, wanting the site specific items. 
     77
     78Note: some fields are not required. 
     79
     80Subsq RX 2nd Course  and other of these fields all seem to be 00 or 0.  Arvinder thinks this may be an error with the first part of ETL and Varchars.  Arvinder will work with John and Tim to run the frequency of those columns. 
     81
     82- The SEER site recode: John says can Tim ask his Vendor if they do that already and have that data available for Dan.  Tim suspects they do.  This would really help with ontology creation. 
     83
     84'''Brainstorming with Tim on how this out of the box could help the registrar'''
     85- Death index and Death from hospital could save them time
     86
     87- validate that the data coded by his team is being done accurately.  For example, are they using very old codes and rad therapy technologies when they should be using a newer or more accurate term (beam radiation).  Class of case is another example. Collaborative staging.
     88
     89- For the annual report, integrating data from HERON like BMI could really add value to their annual report. 
     90
     91- HERON for investigators is a win because he doesn't have to run all their exploratory queries.  For example Steve Williamson asks routine questions every year about how many patients have this type of histology and site combinations.  Tim wins as well because he can incorporate additional data.
     92
     93- Follow up report could be useful.  What's changed since he last coded the case?  Note: some of this kind of work is already provided by merged reports coming from KUH IT staff.  We don't want to duplicate that work.  Would also need to understand the operational commitment to fund this kind of work.
     94
     95- Could long term though it might take a lot of work to find things automatically or check for things.  Like CA19-9 over 1000.  PSA over 7, clinical recurrence.   First level: present the clinical data  to the registrar, Second level: auto populate.  Followup: has anything cropped up since last coded?  Of those, anything which needs recoding or noting that there is recurrence? 
     96
     97
     98
     99
     100
     101
     102