I would explain bellow that how bind peeking played a
disastrous role in one of my databases.
For usage of bind variables, a good post by Arup Nanda is available here
From one of my OLTP databases (10.2.0.4), suddenly I started
receiving OEM alerts for “run queue length” as follows. (My server has total 24
CPUs)
Target Name=<TARGET_NAME>
Target type=Host
Host=<HOST_NAME>
Occurred At=11-Sep-2014 06:04:13 SGT
Message=CPU Load is 27.08, crossed warning (18) or critical (24) threshold.
Metric=Run Queue Length (5 minute average)
Metric value=27.08
Severity=Critical
Acknowledged=No
Notification Rule Name=Host Availability and Critical States
Notification Rule Owner=SYSMAN
Target type=Host
Host=<HOST_NAME>
Occurred At=11-Sep-2014 06:04:13 SGT
Message=CPU Load is 27.08, crossed warning (18) or critical (24) threshold.
Metric=Run Queue Length (5 minute average)
Metric value=27.08
Severity=Critical
Acknowledged=No
Notification Rule Name=Host Availability and Critical States
Notification Rule Owner=SYSMAN
------------------------------------------------
While further investigating the issue, I found that most of
the time was consumed by a query which was part of every transaction (running at that particular time) executed by
each session. This meant that each session was executing this query and therefore overall most of the resources consumption was only by this single query. In OEM (or you can query v$sql_plan using
SQL_ID), I could see multiple plans listed for this SQL and I suspected that perhaps
this current plan caused this query to take longer time to finish, and since
each session had to execute this query, session would not return until this
query completes. Since each session was executing this SQL, the session was not
leaving the CPU and new sessions were waiting for the CPU to be released and
eventually raising “Run Queue Length” metrics value.
Since my SQL was using bind variables, immediate thing which
came into my mind was that probably this SQL was aged out of shared pool and
when it was reloaded, it loaded a new execution plan (the bad one) into the shared pool (plan could change because of bind peeking done during plan generation of SQL). So, I issued “alter system flush shared pool” (not a very good idea to invalidate all cursors), to invalidate the cursors in the
shared pool and let it reload all cursors again (including this SQL). This thing worked; as I saw a new plan for this SQL and “run queue
length” immediately came down to less than 10.
From dba_hist_sql_plan, I found that there were total 6
different execution plans for this SQL and then I used awrsqrpt.sql script to
fetch the execution details of this SQL against each of this plan from AWR
repository and following were the interesting details I extracted from SQL AWR reports
Plan with hash value 465940006, Average
Physical Reads per execution => 319
Plan with hash value 1122690139,
Average Physical Reads per execution => 36
Plan with hash value 976138646,
Average Physical Reads per execution => 1285
Plan with hash value 2677530476,
Average Physical Reads per execution => 144
Plan with hash value 3841102669,
Average Physical Reads per execution => 37
Plan with hash value 2920758856,
Average Physical Reads per execution => 155
You can see that plan 976138646 is doing a lot of physical
reads and this was actually the culprit at this time as it was slowing down the execution of this particular SQL. After I flushed the
shared pool, this plan (976138646) was gone and next plan loaded was 1122690139 which made the things better for me.
The other better plan here is 3841102669 which was also in use in the past.
Solution to the problem
Invalidating the cursors can help to reload the new and
better plan, but this is not the permanent solution. Starting 11g, adaptive cursor sharing feature is very helpful as Oracle may select a different execution plan based on the values in bind variables during different executions of same SQL.
In 10g, oracle profiles and starting from 11g, sql plan management features could be used to attach the plans with particular SQLs to avoid this kind of situation.
No comments:
Post a Comment