wiki:HeronStatsPlugins

Version 57 (modified by dconnolly, 3 years ago) (diff)

--

In HERON, we aim to supplement i2b2 with a range of statistical analysis methods for secure, on-line analysis. We have developed two main software components to support this aim: rgate and R Data Builder.

rgate: gateway between i2b2 plugins and R

rgate is an approach to using the R statistical platform in interactive i2b2 analysis tool plugins. i2b2 is carefully designed by informaticians to give clinical and translational researchers a single query interface to data integrated from a wealth of sources while addressing privacy and security concerns of privacy officers for their patients and healthcare institutions. rgate lets us supplement the query interface with analysis tools written by biostatisticians while minimizing increase in privacy and security risks by careful application of the principal of least authority.

It is based on the 2011 R Engine Cell work by Segagni and colleagues but shortens the path that data follows from the data warehouse to R.

We adapted their survival plugin for use with our cancer TumorRegistry data and made it available to our users in the February 2012 HERON "Bow Creek" release.

In the March 2013 HERON "Cow Creek" release, we added a generalized survival plug-in:

  1. The seminal event may be any observation in our repository, not just a cancer diagnosis. (#1304)
  2. Rather than comparing cohorts by grade or stage, any two cohorts can be compared. (#833, #1347)

R Users Conference presentation June 2012

Abstract:

Presentation slides: rgate-useR-2012.pdf

Source code and development notes

While the rgate code is not yet packaged for use outside of HERON, you are welcome to study the source code, which is split across two repositories:

Development notes, including some outstanding issues, include:

#833
multicohort survival analysis plugin

#1304
drag-and-drop entrance criteria, outcome, censor for survival analysis plug-in

Design options for R integration with i2b2/HERON that we explored:

  1. as per R-engine-cell, where all data goes via the client
  2. as per #803, where rgate gets data via CRC, does transformation in python, and does just the final KM function and plotting in R
  3. doing the transformation in R, as in source:rgate/rgate/km_analysis.R
  4. doing the SQL query right from python, cutting out the CRC cell
  5. doing the SQL query right from R

deployment details, as of Oct 2012: ticket:822#comment:4

R Data Builder for RStudio Server Integration

RStudio Server provides an integrated development environment (IDE) for R over the web. In HERON, we have integrated i2b2 with with RStudio via an R Data Builder plug-in. It lets investigators (limited to the HERON study team at this time) query the data repository and save the results on the server. They can then log in to RStudio Server and load the data for further analysis. Like i2b2 interactive queries, and unlike bulk export, this approach is designed to keep the data on the server in our data center.

R Data Builder summary slide (screenshot)

I2B2 Academic User Group Presentation June 2013

abstract:

presentation slides: extending-i2b2-with-R.pdf

Source code and development notes

The R Data Builder is part of rgate. While it is not yet packaged for use outside of HERON, you are welcome to study the source code:

Development notes include:

No results

attachment:rgate_api_tour.html is a note on an earlier design.

RGateDataFlows

Applications: Data Extract summaries in REDCap Case Report Forms

No results

Future Work

  • how to handle risk of, e.g. sending a dataset from RStudio by email?
    • limit access to HERON study team, for now (#1656)
  • limit DB queries? #1471 (no)
  • perhaps use oracle SAMPLE to bound query result size?

Machine Learning

possible UI: define cohorts in I2B2, hand-off a la #RStudioIntegration

input spec:

  • 2 cohorts as input
    • R implementation note: another column, 0/1
  • list of features (columns)
  • start with inventory of machine learning algorithms in R (logistic regression, SVM? C4/C5? neural net?)
    • RW: neural nets are hard to explain to people
  • system engineering? compete for CPU, RAM with production I2B2
    • RW: don't worry about that much; we'll think about shipping off to a high-performance computing cluster

Literature Review

typical outcomes research paper?

Attachments (7)

Download all attachments as: .zip