Changes between Initial Version and Version 57 of HeronStatsPlugins


Ignore:
Timestamp:
07/25/14 16:51:06 (3 years ago)
Author:
dconnolly
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HeronStatsPlugins

    v1 v57  
     1[[PageOutline]]
     2
     3In [[HERON]], we aim to supplement i2b2 with a range of statistical analysis methods for
     4secure, on-line analysis. We have developed two main software components to support this aim: [#rgate rgate] and [#r-builder R Data Builder].
     5
     6== rgate: gateway between i2b2 plugins and R == #rgate
     7
     8**rgate** is an approach to using the R statistical platform in interactive i2b2 analysis tool plugins. i2b2 is carefully designed by **informaticians** to give **clinical and translational researchers** a single query interface to data integrated from a wealth of sources while addressing privacy and security concerns of **privacy officers** for their patients and healthcare institutions. `rgate` lets us supplement the query interface with analysis tools written by **biostatisticians** while minimizing increase in privacy and security risks by careful application of the principal of least authority.
     9
     10[[Image(rgate-useR-2012-conclusion.png)]]
     11
     12It is based on the 2011 [http://www.ncbi.nlm.nih.gov/pubmed/21262924 R Engine Cell] work by Segagni and colleagues but shortens the path that data follows from the data warehouse to R.
     13
     14We adapted their survival plugin for use with our cancer TumorRegistry data and made it available to our users in the [blog:2012/02/heron-bowcreek February 2012 HERON "Bow Creek" release].
     15
     16In the [blog:heron-cow-creek-update March 2013 HERON "Cow Creek" release], we added a generalized survival plug-in:
     17 1. The seminal event may be any observation in our repository, not just a cancer diagnosis. (#1304)
     18 2. Rather than comparing cohorts by grade or stage, any two cohorts can be compared. (#833, #1347)
     19
     20
     21=== R Users Conference presentation June 2012 === #useR2012
     22
     23Abstract:
     24 - Connolly, D. W., Adagarla, B., Keighley, J. & Waitman, L. R. [http://biostat.mc.vanderbilt.edu/wiki/pub/Main/UseR-2012/141-Connolly.pdf Integrating R efficiently to allow secure, interactive analysis within a clinical data warehouse]. in [http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012 8th Int. UseR Conf.] (2012).
     25
     26Presentation slides: [attachment:rgate-useR-2012.pdf:wiki:KUBMIPresentations rgate-useR-2012.pdf]
     27
     28=== Source code and development notes === #rgate-source-dev
     29
     30While the `rgate` code is not yet packaged for use outside of HERON, you are welcome to study the source code, which is split across two repositories:
     31 - source:rgate: the gateway itself and the back-end R scripts
     32 - source:kmstat: the survival analysis plugins that use rgate
     33
     34Development notes, including some outstanding issues, include:
     35
     36[[TicketQuery(id=748|833)]]
     37[[TicketQuery(id=803|809|847|1131|1972|1226|1304)]]
     38
     39Design options for R integration with i2b2/HERON that we explored:
     40
     41 1. as per R-engine-cell, where all data goes via the client
     42 2. as per #803, where rgate gets data via CRC, does transformation in python, and does just the final KM function and plotting in R
     43 3. doing the transformation in R, as in source:rgate/rgate/km_analysis.R
     44 4. doing the SQL query right from python, cutting out the CRC cell
     45 5. **doing the SQL query right from R**
     46
     47''deployment details, as of Oct 2012: ticket:822#comment:4''
     48
     49== R Data Builder for RStudio Server Integration == #RStudioIntegration
     50
     51[http://www.rstudio.com/ide/ RStudio Server] provides an integrated development environment (IDE) for R over the web. In HERON, we have integrated i2b2 with with RStudio via an **R Data Builder plug-in**. It lets investigators (''limited to the HERON study team at this time'') query the data repository and save the results on the server. They can then log in to RStudio Server and load the data for further analysis. Like i2b2 interactive queries, and unlike bulk export, this approach is designed to **keep the data on the server** in our data center.
     52
     53[[Image(r-data-builder-summary-slide.png)]]
     54
     55=== I2B2 Academic User Group Presentation June 2013 === #i2b2-2013-talk
     56
     57abstract:
     58
     59 - Connolly, D. W. & Waitman, L. R. [attachment:IntegratingRwithI2B2.pdf Extending an I2B2­based Clinical Data Repository with the R Statistical Platform]. in 3rd Annu. [https://www.i2b2.org/work/aug.html I2b2 Acad. User Group] Conf. NLP Work. (2013).
     60
     61presentation slides: [attachment:extending-i2b2-with-R.pdf:wiki:I2B2Community extending-i2b2-with-R.pdf]
     62
     63=== Source code and development notes === #r-data-builder-dev
     64
     65The R Data Builder is part of [#rgate rgate]. While it is not yet packaged for use outside of HERON, you are welcome to study the source code:
     66  - source:kmstat/DFBuilder: the plug-in front end
     67  - source:rgate/rgate/dfbuilder.R: the R back-end script
     68    - source:rgate/rgate/dfbuilder_test.R
     69
     70Development notes include:
     71
     72[[TicketQuery(id=1485|1622)]]
     73
     74attachment:rgate_api_tour.html is a note on an earlier design.
     75
     76[[RGateDataFlows]]
     77
     78=== Applications: Data Extract summaries in REDCap Case Report Forms ===
     79
     80[[TicketQuery(id=1759)]]
     81
     82
     83== Future Work ==
     84
     85  * how to handle risk of, e.g. sending a dataset from RStudio by email?
     86    - limit access to HERON study team, for now (#1656)
     87  * limit DB queries? #1471 (no)
     88  * perhaps use oracle SAMPLE to bound query result size?
     89
     90
     91=== Machine Learning ===
     92
     93possible UI: define cohorts in I2B2, hand-off a la [[#RStudioIntegration]]
     94
     95input spec:
     96
     97  - 2 cohorts as input
     98    - R implementation note: another column, 0/1
     99  - list of features (columns)
     100
     101 - start with inventory of machine learning algorithms in R (logistic regression, SVM? C4/C5? neural net?)
     102   - RW: neural nets are hard to explain to people
     103  - system engineering? compete for CPU, RAM with production I2B2
     104   - RW: don't worry about that much; we'll think about shipping off to a high-performance computing cluster
     105
     106== Literature Review ==
     107
     108typical outcomes research paper?
     109 - http://jama.jamanetwork.com/article.aspx?articleid=900247
     110   - nomination from Russ Apr 4 2013] looks at los and mortality after a tele-icu process change
     111 - http://jama.jamanetwork.com/article.aspx?articleid=1674237
     112
     113 - Contemp Clin Trials. 2012 Sep;33(5):1088-93. doi:10.1016/j.cct.2012.06.007. Epub 2012 Jun 30. [http://www.ncbi.nlm.nih.gov/pubmed/22750086 Sample size re-estimation in an on-going NIH-sponsored clinical trial: the secondary prevention of small subcortical strokes experience]. McClure LA, Szychowski JM, Benavente O, Coffey CS.
     114   - I (Dan) tried to reproduce parts of it in HERON Feb 5, 2013, using what I learned in Jo Wick's class.