Apex Schema Optimizer Statistics

Providing the Oracle optimizer with accurate statistics on the Apex schema tables can be challenging.

Many of tables tracking Apex session state information see a huge turnover in data such that statistics gathered overnight can quickly become stale resulting in poor choice of execution plan by the optimizer and seriously degraded performance.

There are several ways to tackle the issue e.g. locking the stats on some of the tables to values that are known to perform reliably.

However I believe it’s better to give the optimizer the most accurate information you can, so it can make more informed decisions based on the actual data in the tables.

The problem is that even if you refresh the optimizer statistics often, you need to ensure that an execution plans that are stored in the shared_pool are also cleared or you risk still using a plan that does not reflect those new statistics.

Fortunately you can achieve all this with one relatively straight forward job i.e.

conn / as sysdba
variable v_jobno number;
begin
dbms_job.submit(
:v_jobno,
'dbms_stats.gather_schema_stats(ownname=>''APEX_040200'',options=>''GATHER STALE'',cascade=>true,no_invalidate=>false);',
sysdate+(5/(24*60)),
'sysdate+(5/(24*60))',
TRUE);
commit;
end;
/

This will create a job that runs every five minutes and refresh statistics on any Apex tables that are marked as stale, but also thanks to the no_invalidate option, it will clear any plans from the shared_pool that would be affected by the new statistics gathered.

You should make sure the schema reflects your Apex version, but with this job in place you should have no worries about stale statistics on the Apex session state tables impacting your application performance.

Advertisements

Apex, Oracle 11g & Ultra SPARC T2

In my experience the early Ultra SPARC T Series chips are not great when it comes to supporting Apex applications, the single thread performance is poor and reflects in significantly increased average page times when compared to more conventional CPUs.

However recently encountered and unusual problem after a site upgraded a database running on a T5240 from Oracle RDBMS v10.2.0.5 to v11.2.0.2.

Average page times had gone from 1s to around 4s since the upgrade, and users were complaining loudly about performance.

Statspack was telling us that there was an usually high number of cursor: pin S wait on X waits – but why ?

Top 5 Timed Events                                      Avg %Total
wait   Call
Event                              Waits    Time (s)   (ms)   Time
--------------------------- ------------ ----------- ------ ------
CPU time                               24,208          85.6
cursor: pin S wait on X           24,177       1,361     56    4.8
db file sequential read          659,410         943      1    3.3
External Procedure call              206         861   4177    3.0
log file parallel write           93,792         251      3     .9

Well apparently its all down to mutexes – and once again it was Tanel Poder that set me on the right path

The above link describes a problem he had encountered at 10g – the bit that set me thinking was

When a latch getter can’t get the latch after spinning, it will go to sleep to release the CPU. Even if there are many latch getters in the CPU runqueue before the latch holder, they all spin quickly and end up sleeping again. But when a mutex getter doesn’t get the mutex after spinning, it will not go to sleep!!! It will yield() the CPU instead, which means that it will go to the end of runqueue and try to get back onto CPU as soon as possible. So, mutex getters in 10.2 are much less graceful, they can burn a lot of CPU when the mutex they want is held by someone else for long time.

On a T series there’s nearly always a CPU thread available (particularly if like this customer you don’t have that many users accessing the system) so even if a mutex getter yields then it will (nearly always) get a CPU thread back straightaway – so will pretty much be constantly on CPU trying to get hold of the mutex (which explained the high cpu usage and large number of waits – and presumably the performance problems experienced by the customer).

Tanel suggested this was fixed at 11g (mutex getters should sleep instead of just yielding), but it turns out that’s not quite the whole story – the default behaviour is still to always yield, but there are some hidden parameters that provide some control

Bug 10411618 – Enhancement to add different “Mutex” wait schemes

In order to encourage the mutex getters to sleep a bit instead of just yielding all the time I got the customer to add the following parameters

alter system set "_mutex_wait_scheme" = 1 scope = spfile;
alter system set "_mutex_wait_time" = 8 scope = spfile;

which means they should yield the first time but will sleep for 8ms after each subsequent attempt to acquire the mutex before trying again.

The customer confirmed that since applying the parameters performance has improved to around the same level as prior to the upgrade to 11g.

Apex Monitoring Scripts

Since you may not always have access to the Apex administration pages to monitor the activity and performance of your Apex application, here are some scripts that you can use to extract the headline figures directly from the log tables.

apex-by-day.sql – the Page Views by Day report
SQL> @apex-by-day

Date        Avg Time Max Time Page Views Users
----------- -------- -------- ---------- -----
04-APR-2011   0.2315     30.2      21582   111
03-APR-2011   0.3673     11.4       9080    11
02-APR-2011   0.4219      6.4        549     2
01-APR-2011   0.8711      4.1         45     2

apex-by-hour.sql – the Hourly Usage report
SQL> @apex-by-hour
Enter value for which_date: sysdate

Hour                 Avg Time Max Time Page Views Users
-------------------- -------- -------- ---------- -----
04-APR-2011 14:00:00   0.1983     30.2       3074    46
04-APR-2011 13:00:00   0.3098     30.2       2237    48
04-APR-2011 12:00:00   0.2637     30.2       3580    60
04-APR-2011 11:00:00   0.2398     30.2       5611    71
04-APR-2011 10:00:00   0.2148     30.1       2942    54
04-APR-2011 09:00:00   0.1968     30.1       2683    62

apex-by-weight.sql – the Weighted Page Performance report

These are the first reports that I check when looking at the current performance levels of an Apex application.

Apex Library Cache Latch Contention

I recently encountered an unusual contention issue on the library cache latch, and thought it was worth recording what happened and how the issue was resolved.

The problem occurred with one of our Oracle Apex applications running on v3.1.2 of Oracle Apex and v10.2.0.5 of Oracle RDBMS. For the sake of completeness the application was hosted on an IBM AIX 6 server but that was not relevant to the investigation.

The site who reported the problem are a relatively high volume site, with over 800 users and around 400,000 page views per day. The were reporting a serious degradation in performance under load, to the point where the application was almost unusable so an urgent investigation was required.

The site had already run off some AWR reports that clearly showed the problem lay with the library cache latch – the Top 5 Timed Events from the report are shown below.

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
---------------------------- ----------- ----------- ------ ------
latch: library cache 692,012 86,615 125 20.0
CPU time 41,692 9.7
db file sequential read 3,656,449 30,389 8 7.0
latch: row cache objects 164,161 23,585 144 5.5
latch: library cache lock 176,975 21,011 119 4.9

So the problem was contention for library cache latch but what was causing it ?

The site had recently applied a patch to the application, so naturally that was carrying the blame – but the patch was relatively minor and did not seem like it could be responsible for the problems we were seeing.

So what are the most likely causes for contention on the library cache latch ?

Experience tells us that we should be looking for SQL statements that are not being reused (maybe using literals rather than bind variables) or possibly that the shared pool is simpy too big. The site in question had similar problems in the past due to poorly configured Automatic Shared Memory Management parameters that had allowed the shared pool to grow to over 10GB in size.

The shared pool was a reasonable size so the focus initially was on looking for SQL statements that weren’t being reused. Sure enough we found a couple of candidate statements that were showing a very high version count – statements with a high version count are reported in AWR and StatsPack reports but the script top10.sql is also useful for identifying them.

However the SQL statements we had identified were not new and were also showing similar high version counts at other sites who were not experiencing the same contention.

We needed to find out exactly what was causing the contention on the library cache latch – and I was under the impression that was going to be difficult.

Enter the frankly invaluable latchprof.sql script from Tanel Poder which does exactly that.

There’s even a handy guide that tells how to use it to troubleshoot latch contention – how had I not found this before ?

Running latchprof.sql against the database showed the following

NAME SQLID Held Gets Held % Held ms Avg hold ms
----------------------------------- ------------- ---------- ---------- ------- ----------- -----------
library cache lock allocation 4dqasqjrvudqn 7880 3548 78.80 17367.520 4.895
library cache lock allocation 6472 3034 64.72 14264.288 4.701
library cache pin allocation 408 373 4.08 899.232 2.411
library cache lock allocation g78p6p30n7srb 87 11 .87 191.748 17.432

So that was pretty conclusive, the contention was down to one particular SQL statement – but what was it ?

Well it turns out this was the call to

dbms_session.reset_package

that is invoked by every request that is made to the Oracle Apex application – its a consequence of the Oracle Apex architecture.

There is simply no way we can stop the application from calling this routine – but why was it suddenly causing such crippling contention ?

To cut a long story short the customer had set the following database initialisation parameter

*.session_cached_cursors=75

Now on the face of it that doesn’t seem unreasonable – caching session cursors is a good thing isn’t it ?

Well actually when you’re dealing with an Oracle Apex application caching session cursors is largely irrelevant.

As we have already discovered Oracle Apex calls dbms_session.reset_package essentially for each page view recorded in the application, and one of the many things this routine does is to close all session cursors. All of which means that Oracle Apex applications will hardly ever reuse cursors, and you should be expecting to see a relatively high number of Soft Parses in your database.

Consequently the recommendation for our application has always been to stick with the default value for the session_cached_cursors parameter – at Oracle v10.2.0.5 this is 20, although it increases to 50 at Oracle v11.2.0.2.

The theory we were now operating under was that since each session in the Oracle Apex session pool was potentially caching up to 75 cursors, the repeated calls to dbms_session.reset_package were that much more expensive and were therefore resulting in the latch contention that we were seeing.

This fix was obvious – we needed to remove the session_cached_cursors parameter from the database and revert back to the recommended configuration for the Oracle Apex application.

After making that change the latch contention disappeared and performance at the site returned to normal – although the number of Soft Parses did increase!

The lessons to be learned here are

  1. Oracle Apex applications hardly ever reuse session cached cursors
  2. If you’re suffering from latch contention then you need latchprof!

Hope all that proves useful should you find yourself in a similar situation.  🙂