How to investigate real time over all performance bottleneck (enq: TX – row lock contention) using OS tools like top/sar/vmstat,ASH,AWR,ORATOP and OEM

How to investigate real time over all performance bottleneck (enq: TX – row lock contention) using OS tools like top/sar/vmstat,ASH,AWR,ORATOP and OEM

 

I will simulate “row lock contention” using following procedure( I will resue my test case mentioned in Test case 4:-Now I will run update in parallel 10 sessions of https://clouddba.co/test-case-when-index-size-can-grow-than-column-size-and-effect-of-rebuild-index/)

CREATE OR REPLACE procedure TEST.manyupdates as
v_m number;
begin
for i in 1..100000 loop
select round(dbms_random.value() * 44444444444) + 1 into v_m from dual t;
update t set id=v_m/15*i where rownum<10;
commit;
end loop;
end;
/

CREATE OR REPLACE procedure TEST.manysessions_update as
v_jobno number:=0;
begin
FOR i in 1..10 LOOP
dbms_job.submit(v_jobno,’manyupdates;’, sysdate);
END LOOP;
commit;
end;
/

Exec TEST.manysessions_update;

As we are running this procedure parallel with 10 sessions,it will definitely create row lock contention even in very short time.

Now application team will raise concern that system is very slow and you as DBA start to investigate.

A.real time performance tuning using OS tools

 

Tool1:-top utility

 

You should check %CPU section to identify top CPU consuming processes

You will notice load average is almost 5.It means at least 5-6 processes are running at a time.You can check in TOP with column S where it shows ‘R’ means running and ‘S’ means sleeping.

Also note 72% idle and 22% CPU busy.It is just a indication that your system is bit busy with CPU load.

%Cpu(s): 22.6 us, 4.6 sy, 0.0 ni, 72.3 id, 0.1 wa, 0.0 hi, 0.5 si, 0.0 st

If you want to investigate which process takes more memory,you need to sort the output on basis of “RES”

Tool 2:-SAR utility with -U option to check CPU and IO bottleneck

sar -u 10 8

%user: This shows the total time that the processor is spending on different process .

%iowait: the name iowait itself suggests that its the time spend by processor waiting for devices(input and output)

%nice: Most of you guys must be knowing that a user can change the priority of a linux process. This table shows the time spend by CPU for process whose nice value has been changed.

%steal:  This column shows the amount to time spend by a CPU (which is virtualized), for resources from the physical CPU

%idle: This suggests the idle time spend by the processor.

A low %idle time could point to which processes are using the most CPU . Use the ps or top command  to find a CPU-intensive job. A poorly written query requiring a large amount of disk access can also cause a large amount of CPU usage as well.
In the above sar output, the cause for concern is the large values being returned for %user means application is using more cpu.%iowait (waiting for block I/O) is low here so we are not having disk contention.

We need to make below consideration while checking sar report of CPU

  • Low CPU idle times.
  • High percentage of time spent waiting on I/O or %iowait> 10.
  • Bottlenecks with %system> 15, which could indicate that swapping, paging, or backups are causing a bottleneck.
  • Abnormally high %user, which could be due to applications not being tuned properly or CPU over utilization.This is our case.

Tool 3:vmstat report

vmstat 1 10

r column is runnable processes.So 7 processes are currently running.IDLE percent is overall 70%.

Any non-zero value of “SO” column leads to probable memory leak in SWAP which needs to be analyzed immediately.

For more info about vmstat:-

Linux vmstat Command – Tool to Report Virtual Memory Statistics

Tool 4:To identify disk bottleneck

 

sar -d 5 2

tps: tps stands for transfer per second, so it shows the transfer per second to that particular device

rd_sec/s: this shows you the total number of sectors on that device which is being read

wr_sec/s: if rd_sec/s is sectors being read per second then obviously wr_sec is sectors being written per second.

await: this shows the total number of time that the processor waited for requests regarding IO

%util on a device that is greater than 50 percent denotes disk bottleneck.In our case it is normal for highly utilized disk.
If await is greater than svctm due to Unbalanced disk I/O load.

Please note disk dev129-160 is most busiest disk.The tps is 100 almost.

Tool 5:-sar -b to report disk usage

 

  • tps – Transactions per second (this includes both read and write)
  • rtps – Read transactions per second
  • wtps – Write transactions per second
  • bread/s – Bytes read per second
  • bwrtn/s – Bytes written per second

The sar –b command reports on the system’s buffer cache activities (not the Oracle buffer cache). It gives you the number of transfers per second between system buffers and block devices.

Tool 6:-sar -q to find processes under queue.We need to look blocked section

 

Tool 7:-To identify memory usage using sar -r

 

in the above output most of the columns are self explanatory(and most of the outputs are in KB).
kbmemfree: this shows the amount of free memory
Kbmemused: memory used
%memused: percentage of memory used
kbbuffers: buffer memory used by the kernel.
kbcached: cached memory used by the kernel
all other entries for memory are swap(free,used,percentage etc)

B.real time performance tuning using ORATOP report

 

./oratop -f -d -i 10 / as sysdba

 

ASW:-Active session waiting

Event:-Enq:TX – row lock contention is top wait event.AVG_MS is almost 0.It will be big concern if AVG_MS is very large which means your session is being blocked long time.

SQLID/BLOCKER:-You will get which session is blocking and which sessions are blocked.

Other statistics are also will be useful for more investigation.

Please refer for more info on oratop :-

AWR report analysis in depth-part 1

C.real time performance tuning using OEM monitoring:-

 

Form OEM console you can check how many active sessions are waiting and RED means concurrency wait.You can also find SQL_ID responsible.

D.real time performance tuning using AWR report:-

 

You may look below section for AWR.The most important top 10 foreground events .42% DB time has been contributed by enqTX – row lock contention.

This section segments by row lock waits show which objects are getting locked.

E.Real time performance tuning using ASH report:-

ASH report is very handy for generating which SQL are responsible for locking.

Please check top sessions with SID and SERIAL for deeper details.Please check %event column to check what are most eventful wait events.

F.Real time performance tuning using SQL Developer Please refer below:-

 

Real time performance tuning using Oracle SQL Developer