wiki:HeronPerformance

With over a billion facts, performance is a challenge in HERON.

See also GroupOnly/OracleMeetingNov, attachment:HERON_Performance.pptx

See also HeronSizingGrowth#ColumnStoreOptions.

Performance tickets

Ticket Type Summary Resolution Priority Status Owner
#554 design-issue Performance issue in the IDX load fixed major closed mhoag
#1613 defect Explosion of diagnosis codes leads to poor performance fixed major closed ngraham
#2132 design-issue Improve efficiency of term_tree creation for enhance_concepts fixed major closed mhoag
#2285 defect HERON performance page doesn't include timeline queries fixed major closed dconnolly
#2313 problem Navigating HERON term hierarchy is slow for users with REDCap access fixed major closed badagarla
#2394 design-issue Creating tooltips takes from seconds to hours depending on table statistics fixed major closed ngraham

Performance testing

from our 10 Feb meeting; Nathan, is this obsolete?

select
  case when compare.factor < 0.75 then 'FASTER'
  when compare.factor > 1.25 then 'SLOWER'
  end summary,
  compare.*
from (
select agg.querydesc, mx, agg.t agg_t, bit.t eval_t,
   case when agg.t != 0 then round(bit.t/agg.t, 2)
   else null end factor, agg.queryname
from (
select round(avg(testduration), 2) t, round(max(testduration), 2) mx, queryname, querydesc
from ngraham.timingtests
group by queryname, querydesc
) agg
join (
select round(testduration, 2) t, queryname
from ngraham.timingtests
where testrundesc='With bitmap indexes.'
) bit on bit.queryname = agg.queryname
) compare
order by factor;
  • develop SQL scripts to characterize performance that our users are seeing (#1069)

Adding instance_num to observation_fact pk in 1.6

  • instance num or not in indexes on blueherondata? (#858). Agreed (10 Feb 2012): stick with vanilla 1.6 for clinton release, to get a baseline

Hardware contention, tuning

  • note: test app server and prod app server are connected to different Oracle instances, but the instances are on the same host, so there are possible interactions.
  • Oracle CPU tuning?
    • Do we have Oracle support?
      • yes. We could ask them to look at our tuning
      • mnair to look into memory/cpu tuning (#865)
      • mnair to open the communication channel with Oracle support

Techniques to try, be aware of

Star Transformations

see OracleTips#star-tx

Range Queries/Bind Variables

advice from Brandon:

Bind variables are usually a good thing especially when you've got lots of unique queries and the overhead of parsing adds up so it puts a burden on the CPU. When you've got fewer longer running queries like in a DW then they may not be a good thing. Reason is the optimizer doesn't have access to the values from the query to base it's decisions on. So in a range scan the statistics should tell it the max and min values for a given column in a table. If it has the values (not using band vars) then it should be able to tell if the query is searching in the middle of the possible values (full table scan probably best) or if it is towards the beginning/end of the possible range (index scan best). There are also histograms you can use which can give the optimizer better info on the layout of information in a column but they also require you not to use bind variables. I also think you can use fast index scans for range values. Basically pulls the index in larger chunks from the disk.

"Supercharging i2b2"

Bhargav went to this presentation about skipping the hive middle tier and just using (T-SQL) stored procedures (among other optimizations), resulting in up to 10x performance boost:

in procedings of:

History

with 5k patients, it was fine.

after loading from Epic, life got hard; esp lab value lt/gt X (#418)

then we got solid state

see also:

  • solid state storage notes (blog item, #474)

Sad performance April 2012

  • Here's a query to look at poor system performance as of late
  • We also noticed several run away processes hitting Oracle that we killed this afternoon.
select avg((qqri.end_date - qqri.start_date)*24*60*60) as elapsedsecondsavg, to_char(qqm.create_date, 'YYYY-MM') ,count(qqm.query_master_id) as num_queries, avg(qqri.real_set_size) as avg_set_size
from BLUEHERONDATA.qt_query_master qqm 
join BLUEHERONDATA.qt_query_instance qqi
on qqm.query_master_id= qqi.query_master_id
join BLUEHERONDATA.qt_query_result_instance qqri
on qqi.query_instance_id = qqri.query_instance_id
group by to_char(qqm.create_date, 'YYYY-MM') 
order by to_char(qqm.create_date, 'YYYY-MM') desc
;
ELAPSEDSECONDSAVGTO_CHAR(QQM.CREATE_DATE,'YYYY-MM')NUM_QUERIESAVG_SET_SIZE
4645.843537414965986394557823129251700682012-0414735950.0816326530612244897959183673469388
108.9360341151385927505330490405117270792012-034706465.266524520255863539445628997867803838
191.0732394366197183098591549295774647892012-027164970.539436619718309859154929577464788732
46.79766081871345029239766081871345029242012-018606556.325146198830409356725146198830409357
49.094562647754137115839243498817966903072011-124235571.744680851063829787234042553191489362
35.366708385481852315394242803504380475592011-118022756.903629536921151439299123904881101377
13.016051364365971107544141252006420545752011-106233792.191011235955056179775280898876404494
32.814184397163120567375886524822695035462011-0970510941.6382978723404255319148936170212766
10.190654205607476635514018691588785046732011-0853511668.2130841121495327102803738317757009
199.0348258706467661691542288557213930352011-0720134075.6567164179104477611940298507462687
20.975609756097560975609756097560975609762011-061649324.920731707317073170731707317073170732
106.1045751633986928104575163398692810462011-0515598652.7908496732026143790849673202614379
120.7820069204152249134948096885813148792011-04291124424.7404844290657439446366782006920415
9.613793103448275862068965517241379310342011-0315138593.6551724137931034482758620689655172
16.283524904214559386973180076628352490422011-0226141893.8199233716475095785440613026819923
31.731707317073170731707317073170731707322011-014119947.4878048780487804878048780487804878
5.526315789473684210526315789473684210532010-127614132.5921052631578947368421052631578947
4.110091743119266055045871559633027522942010-1110918352.7981651376146788990825688073394495

Our upgrade to 1.6 was late December. Then it seems Feb, March, Apr of 2012 have been significantly worse. Some horrible performance this past month.

We would want to add this kind of monitoring and possibly stratify by depth of the query and whether it included conjunctions with numeric "greater than" "less than" criteria as well as "occurs" and "date threshold" criteria.

Last modified 4 years ago Last modified on 01/28/14 13:38:17