1.Let me  create a test case : I will take different OS statistics and run time statistics to compare with AWR report generated after execution.

Case A.About test case in lower CPU:-

I will create a table first

create table t (id number, sometext varchar2(50),my_date date) tablespace data;

Now I will create a simple procedure to load bulk data
create or replace procedure manyinserts as
v_m number;
begin
for i in 1..10000000 loop
select round(dbms_random.value() * 44444444444) + 1 into v_m from dual t;
insert /*+ new2 */ into t values (v_m, ‘DOES THIS’||dbms_random.value(),sysdate);
commit;
end loop;
end;
This insert will be executed in 10 parallel sessions using dbms_job.This will fictitiously increase load on database.

create or replace procedure manysessions as
v_jobno number:=0;
begin
FOR i in 1..10 LOOP
dbms_job.submit(v_jobno,’manyinserts;’, sysdate);
END LOOP;
commit;
end;
/

Now I will execute manysessions which will fork 10 parallel sessions

exec manysessions;

Case B.About test case in higer CPU:-

I will create a table first

create table t (id number, sometext varchar2(50),my_date date) tablespace data;

Now I will create a simple procedure to load bulk data.I will make commit out of loop to decrease frequent commit.
create or replace procedure manyinserts as
v_m number;
begin
for i in 1..10000000 loop
select round(dbms_random.value() * 44444444444) + 1 into v_m from dual t;
insert /*+ new2 */ into t values (v_m, ‘DOES THIS’||dbms_random.value(),sysdate);
–commit;
end loop;
commit;
end;
This insert will be executed in 10 parallel sessions using dbms_job.This will fictitiously increase load on database.

create or replace procedure manysessions as
v_jobno number:=0;
begin
FOR i in 1..10 LOOP
dbms_job.submit(v_jobno,’manyinserts;’, sysdate);
END LOOP;
commit;
end;
/

Now I will execute manysessions which will fork 10 parallel sessions.

exec manysessions;

I will execute the test case on server with low CPU (A) as well as high CPU (B).

2.I will use oratop tool to monitor the sessions

oratop – Utility for Near Real-time Monitoring of Databases, RAC and Single Instance (Doc ID 1500864.1)
./oratop -f -d -i 10 / as sysdba
optionally you press x
Then you can put sql_id to see the execution plan as well.

For A.

For B

Section 1 - database
        Global Database information
   Version        : Oracle major version
   role           : database_role
   db name        : db_unique_name
   time        [s]: time as of the most recent stats (hh24:mi:ss)
   up          [T]: database uptime
   ins         [N]: total number of instance(s)
   sn        [c,N]: total user sessions (active/inactive)
   us        [c,N]: number of distinct users
   mt        [s,N]: global database memory total (sga+pga)
   fra         [N]: flashback recovery area %used, (red > 90%)
   er          [N]: diag active problem count (faults)
   % db      [s,N]: database time as %(dbtime/cpu) (red if > 99%)
Section 2 - instance
        Top 5 Instance(s) Activity
        o Ordered by Database time desc
   ID        [c,N]: inst_id (instance id)
   %CPU      [m,N]: host cpu busy %(busy/busy+idle). (red if > 90%)
   LOAD      [m,N]: current os load. (red if > 2*#cpu & high cpu)
   %DCU      [m,N]: db cpu otusef as %host cpu. (red if > 99% & high AAS)
   AAS       [s,N]: Average Active Sessions. (red if > #cpu)
   ASC       [c,N]: active Sessions on CPU
   ASI       [c,N]: active Sessions waiting on user I/O
   ASW       [c,N]: active Sessions Waiting, non-ASI (red if > ASC+ASI)
   ASP       [m,N]: active parallel sessions (F/G)
   AST       [c,N]: Active user Sessions Total (ASC+ASI+ASW)
   UST       [c,N]: user Sessions Total (ACT/INA)
   MBPS      [m,N]: i/o megabytes per second (throughput)
   IOPS      [m,N]: i/o requests per second
   IORL      [m,T]: avg synchronous single-block read latency. (red > 20ms)
   LOGR      [s,N]: logical reads per sec
   PHYR      [s,N]: physical reads per sec)
   PHYW      [s,N]: physical writes per sec
   %FR       [s,N]: shared pool free %
   PGA       [s,N]: total pga allocated
   TEMP      [s,N]: temp space used
   UTPS      [s,N]: user transactions per sec
   UCPS    [c,m,N]: user calls per sec
   SSRT    [c,m,T]: sql service response time (T/call)
   DCTR      [m,N]: database cpu time ratio
   DWTR      [m,N]: database wait time ratio. (red if > 50 & high ASW)
   %DBT      [s,N]: instance %Database Time (e.g. non-rac shows 100%)
Section 3 - db wait events
        Top 5 Timed Events
        o Cluster-wide, non-idle
        o Ordered by wait time desc
  EVENT      : wait event name. (red if active)
        (RT) : Real-Time mode
  WAITS      : total waits
  TIME(s)    : total wait time in seconds)
  AVG_MS     : average wait time in milliseconds
  PCT        : percent of wait time (all events)
  WAIT_CLASS : name of the wait class
Section 4 - process
        o Non-Idle processes
        o Ordered by event wait time desc
   ID          [N]: inst_id. (red if blocking)
   SID         [N]: session identifier. (red if blocking)
   SPID        [N]: server process os id
   USERNAME       : Oracle user name
   PROGRAM        : process program name
   SRV            : SERVER (dedicated, shared, etc.)
   SERVICE        : db service_name
   PGA         [N]: pga_used_mem. (red if continuously growing)
   SQL_ID/BLOCKER : sql_id or the final blocker's (inst:sid, in red)
   OPN            : operation name, e.g. select
   E/T         [T]: session elapsed time (active/inactive)
   STA            : ACTive|INActive|KILled|CAChed|SNIped
   STE            : process state, e.g. on CPU or user I/O or WAIting
   WAIT_CLASS     : wait_class for the named event
   EVENT/*LATCH   : session wait event name. Auto toggle with *latch name.
                    (red if process is hung/spin)
   W/T         [T]: event wait time. (red if > 1s)

3.TOP output during activity

 For Case A:-
We had 4 CPU,so our 6 jobs (status :-S means Sleeping)  had to wait for CPU.Other 4 jobs are running (Status :-R means Running). Also note 86% CPU used because we are consuming 4 CPU out of 4 CPU)There is almost no idle CPU.

For Case B:-

We had 20 CPU,so our all 10 jobs are running (Status :-R means Running).Also note 44.4 % CPU is used because out of 20 CPU we are using 10 CPU.Hence there is 50% idle CPU.

4.VMSTAT report

 For Case A:-

For Case B:-

5.The table size after 10 jobs completed.

SQL> select bytes/1024/1024/1024  from dba_segments where segment_name='T';

BYTES/1024/1024/1024

-------------------

7.375

6.AWR report 360 degree analysis

A.Header Section IN AWR REPORT

For Case A:-

For Case B:-

Same load completed 5 times faster when we have more CPU.
Sessions:=No of active sessions
Cursors/Session:=No of open cursor per session.If this value is increasing or high indicates potential problem of cursor leaking.
Elapsed :=The elapsed time indicates the duration of the report between the 2 selected snapshots. Any other duration figures can be compared back to this. A 30-60 minute reporting period is usually recommended.
DB time:=db time is the time spent in the database.Ideally DB time=CPU time+Non-idle wait time.For example,If we have 10 parallel session each taking 20 minutes to execute,our DB time will be 200 minutes+other query execution time in 20 minutes snapshot period.The DB Time is the time spent in the database for the period of the report. If this is significantly higher than the Elapsed time then this is a good indicator of a heavily loaded system.

B.Load Profile IN AWR REPORT

For Case A:-

For Case B:-

DB Time:-No of active sessions average during snapshot period.This is derived by DB time/Elapsed time in previous section.If the number his high,then probably many active session are there in database on particular point which may indicate bottleneck or opportunity to deep analysis.
DB Cpu:-DB CPU is quite less than DB time ,so sessions are waiting not much working on cpu.This may indicate a problem if your DB cpu percent is quite less than DB time.
Redo size:-11 MB per second.An increase in Redo size and Block changes indicate increased DML(INSERT/UPDATE/DELETE) activity.
Logical reads:-Logical reads is simply the number of blocks read by the database, including physical (i.e. disk) reads, and block changes is fairly self-descriptive.If you find those number higher than expected (based on usual numbers for this database, current application workload etc.), then you can drill down to the “SQL by logical reads” and “SQL by physical reads” to see if you can identify specific SQL responsible.
Block Changes:-This indicates high block change means many  transactions going on.
User Calls:-In particular, when the database is executing many times per a user call, this could be an indication of excessive context switching (e.g. a PL/SQL function in a SQL statement called too often because of a bad plan). In such cases looking into “SQL ordered by executions” will be the logical next step.
Hard parse:-A hard parse occurs when a SQL statement is executed and is not currently in the shared pool.A hard parse rate greater than 100 per second could indicate that bind variables are not being used effectively; the CURSOR_SHARING initialization parameter should be used; or you have a shared pool–sizing problem.
Parse:-This is soft parse+hard parse.A soft parse occurs when a SQL statement is executed and it is currently in the shared pool. A very high soft parse rate could lead to problem of programmed application.
Logons:-Establishing a new database connection is also expensive (and even more expensive in case of audit or triggers). “Logon storms” are known to create very serious problem.In fact log off is more expensive.This indicates problem of not using proper connection pooling.
Executes:-High no of executes always indicates potential load to database.
Transactions:-High number of transactions indicates it is OLTP system.

C.Time Model Statistics IN AWR REPORT

for Case A:

For Case B:

For Case A:-

sql execute elapsed time is 60%.That means DB spent 60% time of DB_TIME is executing SQL query which is OK for now .
DB CPU is used 20% of DB_TIME.
Now We can see PL/SQL execution elapsed time is 11% means many PL/SQL procedure is being executed.
You can see which PL/SQL executed in SQL ordered by Elapsed Time.
Parse time elapsed and hard parse elapsed time is less which is good sign.If these are bigger,we need to check why query is taking more time to parse.

D.Instance Efficiency Percentages IN AWR REPORT

For Case A:-

For Case B:-

Buffer Nowait %:-Less than 99 percent may indicate a problem . This value is the ratio of hits on a request for a specific buffer where the buffer was immediately available in memory. If the ratio is low, then there are (hot) blocks being contented for that should be found in the Buffer Wait section.
Buffer Hit %:- Less than 95 percent may indicate a problem. This value is the ratio of hits on a request for a specific buffer when the buffer was in memory and no physical I/O was needed.A hit ratio that is steadily at 95 percent and then drops to 45 percent should be checked for bad SQL or a dropped index (check the top physical reads SQL) causing a surge in physical reads that are not using an index or an index that has been dropped
Library Hit % :-Less than 95 percent indicates problem. A lower library hit ratio usually indicates that SQL is being pushed out of the shared pool early (could be due to a shared pool that is too small). A lower ratio could also indicate that bind variables are not used or some other issue is causing SQL not to be reused (in which case, a smaller shared pool may only be a bandage that potentially fixes a resulting library latch problem)
In-memory Sort % :- less than 95 percent in OLTP indicates problem. In an OLTP system, you really don’t want to do disk sorts. Setting the MEMORY_TARGET or PGA_AGGREGATE_TARGET (or SORT_AREA_SIZE in previous versions) initialization parameter effectively eliminates this Problem.
Soft Parse %:- Less than 95 percent indicates problem. A soft parse ratio that is less than 80 percent indicates that SQL is not being reused and needs to be investigated.
Latch Hit %:- Less than 99 percent is usually a big problem. Finding the specific latch will lead you to solving this issue.

E.Foreground Events by total wait time IN AWR REPORT

For Case A:-

For Case B:-

This is one of the most important section.You must be concerned if any wait event take abnormally high %DB time apart for DB CPU.If DB CPU is high like 80%,then your application is CPU bound.

For case A,

DB CPU :-It is showing 19.5% means it is may be CPU bound but because we have only 4 CPU and 10 parallel sessions,CPU is not able to do much work and waiting for IO to be completed.Please check OS statistics for reviewing CPU related bottlenecks (% cpu utilization)

Resmgr:cpu quantum:-Indicates it may be problem of allocating sufficient number of CPU cycles.

F.Background wait events IN AWR REPORT

For Case A:-

For Case B:-

For case A:-

The log file parallel write shows LGWR is waiting for blocks to be written to all online redo log members in one group. LGWR will wait until all blocks have been written to all members.   So here we had 80% of background total time spent on log file parallel write.  The parameter of interest here is Avg wait (ms). In our case it is 1ms which is a perfectly respectable figure. Obviously larger average wait times point to slower I/O subsystem or poor I/O configurations.As a rule of thumb, an average time for ‘log file parallel write’ over 20 milliseconds suggests a problem with IO subsystem.
Another thing is log file parallel write is background wait event of log file sync.You need to check if large commit is happening in “user commits” section of “Instance Activity Stats”

G.Host CPU IN AWR REPORT

For Case A:-

For Case B:-

For Case A,

Host CPU

Here you can notice %user is 84%.This really gives you sign that your CPU are highly used.Load average 2.77 (after execution completed) means you are having average 3 sessions running at one time and other 7 sessions are waiting for CPU.But most of the time load average was around 10 (refer oratop and vmstat) as there was 10 sessions running all the time during snapshot period.

Instance CPU

%Total cpu indicates that Database is using 82.6 % cpu of total CPU of the database server. This indicates there are not many other database’s process running currently.If you see less value here it may indicates the database server total cpu may be used by other application or database instances.

H.IO Profile

For case A:-

For case B:-

Here Total requests:=Database requests + Redo requests=1046+322=1381(around) which is actually IOPS.
Total MB=Database (MB) + Redo (MB)=5.9+22.9=30.4 MB which denotes database is generating 30 MB data per second.

I.Memory Statistics

For case A:-

For case B:-

Host Mem(MB):-Total memory in the system
SGA use (MB):-Total SGA used
PGA use(MB):-Total PGA used.
Cache size mainly distributed Buffer cache(where data buffer stored in memory)+Shared pool size.

A low value for the % SQL with exec>1 statistic indicates databasee is not re-using shared SQL statements, usually because the SQL is not using bind variables.

J.OS Statistics

For A:-

For B:-

This is like vmstat report.Almost 83% cpu was used during execution which is quite high.%busy high is good means cpu are not using simultaneous multi threading.

 

4 Replies to “AWR report analysis in depth-part 1”

  1. Deba, simply awesome. …!!!!
    You know oratop had saved me a lot of time during critical performance issues .

  2. Really this is good explanation. Thank you for your time spent to prepare the excellent article it will useful for many DBA’s.

Leave a Reply

Your email address will not be published. Required fields are marked *