Translate into your own language

Wednesday, May 11, 2016

Step by step - How to Analyze AWR Report in Oracle

As you have Generated AWR Report in Oracle, Next task is to Analyze AWR Report in Oracle. By Reading AWR Report  you can easily solve issues like Slow database, high wait events, Slow query and many more. Though It's a lengthy report but Analyzing or Reading relevant part of AWR Report can help to troubleshoot issues in easy and fast manner.

AWR stands for Automatically workload repository, Though there could be many types of database performance issues, but when whole database is slow, then there are two possibilities.

1. Issue with Database Machine. OS Watcher is the best tool to start.

2. If Database performance issue, Then AWR Report is the place to look at.

In case if a particular query is not performing well, i would suggest to look at execution plan of the query, stats of underlying table etc. In this case AWR won't help much.

Recommendations before getting an AWR Report.

1. Collect Multiple AWR Reports: It's always good to have two AWR Reports, one for good time (when database was performing well), second when performance is poor. This way Remote DBA can easily compare good and bad report to find out the culprit.

2. Stick to Particular Time: "Database is performing slow" will not help anymore to resolve performace issues. We have to have a specific time like Database was slow yesterday at 1 Pm and continue till 4Pm. Here, DBA will get a report for these three hours.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 4hrs. it's is better to have four reports each for one hour. This will help to isolate the problem.

In case of RAC env. generate one report for each instance. Once, you have generated AWR report. Now, it's time of analyze the report. Since, AWR report is a huge report and area to look into AWR is also depends on problem to problem. Here, I am listing most common area for a DBA to look into which will give a clear picture of the issue.

Steps to Analyze AWR Report

1. Report Header

Database Details:

After getting an AWR Report This is first and Top part of the report. In this part cross check for database and instance and and database version with the Database having performance issue.This report also show RAC=YES if it's an RAC database.

Host Configuration:

This will give you name, platform CUP, socket and RAM etc. Important thing to notice is number of cores into the system. In this example there are 12 CUP's in Cores.

Snap Shot Detail:

This are the detail about snap shot taken, Snap start time and end time. Difference between them is as "Elapsed". Here is a new term "DB Time"

DB Time= session time spent in database.
DB Time= CPU Time + Non IDLE wait time.
You can find, DB time is very large as compared to Elapse time, which is not a concern. Check if you have taken a report for the time having performance problem. If yes fine, other wise take a report for performance problem time.

Next is Cache Sizes, which is just detail about SGA components.

2. Load Profile:

Here are few important stats for a DBA to look into. Fist is "DB CPU(s)" per second. Before that let's understand how DB CUP's work. Suppose you have 12 cores into the system. So, per wall clock second you have 12 seconds to work on CPU.



So, if "DB CPU(s)" per second in this report > cores in (Host Configuration (#2)).
means env is CPU bound and either need more CPU's or need to further check is this happening all the time or just for a fraction of time. As per my experience there are very few cases, when system is CPU bound.

In this case, machine has 12 cores and DB CPU(s) per second is 6.8. So, this is not a CPU bound case.

Next stat to look at are Parses and Hard parses. If the ratio of hard parse to parse is high, this means Database is performing more hard parse. So, needs to look at parameters like cursor_sharing and application level for bind variables etc.

DB Time(s):
Its the amount of time oracle has spent performing database user calls. Note it does not include background processes.
DB CPU(s):
Its the amount of CPU time spent on user calls. Same as DB time it does not include background process. The value is in microseconds
Redo size:
 For example, the table below shows that an average transaction generates about 19,000 of redo data along with around 48,000 redo per second.
Logical reads:
Consistent Gets+ DB blocks Gets = Logical reads
Block Changes:
The number of block modified during the sample interval
Physical reads:
No of block request causing I/O operation
Physical writes:
Number of physical writes performed
User calls:
Number of user queries generated
Parses:
The total of all parses; both hard and soft.
Hard Parses:
The parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.
Soft Parses:
Soft parses are not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse; hence it consumes far fewer resources.
Sorts:
No of sorts performed
Logons:
No of logons during the interval
Executes:
No of SQL Executes
Transactions:
No of transactions per second

The load profile provides an at-a-glance look at some specific operational statistics. You can compare these statistics with a baseline snapshot report to determine if database activity is different. Values for these statistics are presented in two formats. The first is the value per second (for example, how much redo was generated per second) and the second is the value per transaction (for example, 1,024 bytes of redo were generated per transaction).
Statistics presented in the load profile include such things as:
Redo size - An indication of the amount of DML activity the database is experiencing.

Logical and physical reads - A measure of how many IO's (Physical and logical) that the database is performing.

User calls - Indicates how many user calls have occurred during the snapshot period. This value can give you some indication if usage has increased.

Parses and hard parses - Provides an indication of the efficiency of SQL re-usage.

Sorts - This number gives you an indication of how much sorting is occurring in the database.

Logons - Indicates how many logons occurred during the snapshot period.

Executes - Indicates how many SQL statements were executed during the snapshot period.

Transactions - Indicates how many transactions occurred during the snapshot period.
Additionally, the load profile section provides the percentage of blocks that were changed per read, the percentage of recursive calls that occurred, the percentage of transactions that were rolled back and the number of rows sorted per sort operation.

3. Instance Efficiency Percentages:

In these statistics, you have to look at "% Non-Parse CPU". If this value is near 100% means most of the CPU resources are used into operations other than parsing, which is good for database health.


Execute to Parse % and Parse CPU to Parse Elapsd %:

If the the value are low like in the above case of 3.40 and 0.01 means that there could be a parsing problem. You may need to look at bind variable issues or shared pool sizing issue.

Redo NoWait%:

Usually this stats is 99 or greater

In-memory Sort %:

This can tell you how efficient is you sort_area_size, hash_area_size or pga_aggrigate_target are. If you dont have adequate sizes of sort,hash and pga parameters, then you in-memory sort per cent will go down

Soft parse %:

with 98.20 % for the soft parse meaning that about 1.72 % (100 -soft parse) is happening for hard parsing. You might want to look at you bind variables issues.

Latch Hit %:
should be close to 100.

% Non-Parse CPU:

Most of our statements were already parsed so we weren't doing a lot of re parsing. Re parsing is high on CPU and should be avoided.

4. Top 5 Timed Foreground Events:

This is another most important stats to consider while looking at AWR Report for any database performance related issue. This has a list of top 5 foreground wait events.



Here, first of all check for wait class if wait class is  User I/O , System I/O,  Others etc this could be fine but if wait class has value "Concurrency" then there could be some serious problem. Next to look at is Time (s) which show how many times DB was waiting in this class and then Avg Wait (ms). If Time(s) are high but  Avg Wait (ms) is low then you can ignore this. If both are high or Avg Wait (ms) is high then this has to further investigate.

In the above screen shot, most of the resource are taken by DB CPU = 64% DB time. Taking resource by DB CUP is a normal situation.

Let's take an example,  In which event is "log file switch (checkpoint incomplete) " which has high waits, huge Time (s) and large values in Avg Wait (ms) and wait class is configuration. So, here you have to investigate and resolve log file switch (checkpoint incomplete).

This section provides insight into what events the Oracle database is spending most of it's time on (see wait events). Each wait event is listed, along with the number of waits, the time waited (in seconds), the average wait per event (in microseconds) and the associated wait class.

its critical to look into this section. If you turn off the statistic parameter, then the Time(s) wont appear. Wait analysis should be done with respect to Time(s) as there could be million of waits but if that happens for a second or so then who cares. Therefore, time is very important component.

So you have several different types of waits. So you may see the different waits on your AWR report. So lets discuss the most common waits.

df file type waits:

db file sequential read:

Is the wait that comes from the physical side of the database. it related to memory starvation and non selective index use. sequential read is an index read followed by table read because it is doing index lookups which tells exactly which block to go to

db file scattered read:

caused due to full table scans may be because of insufficient indexes or un-avilablity of updated statistics

direct Path writes:

You wont see them unless you are doing some appends or data loads

direct Path reads:

could happen if you are doing a lot of parallel query activity

db file parallel writes / read: 

if you are doing a lot of partition activity then expect to see that wait even. it could be a table or index partition

db file single write:

if you see this event than probably you have a lot of data files in your database.

direct path read temp or direct path write temp:

this wait event shows Temp file activity (sort,hashes,temp tables, bitmap)
check pga parameter or sort area or hash area parameters. You might want to increase them
Host CPU, Instance CPU and Memory Statistics are self explanatory.

5. Time Model Statistics:

This is a detailed explanations of system resource consumptions. Stats are order by Time (s) and % of DB Time.


A noticeable result Sum of all  % of DB time is > 100%. why is this ?
Because this is cumulative time i.e. In this case SQL execute elapsed time is taking 89% of DB time, which includes it sub parts like parse time elapsed, hard parse elapsed time etc. So, if you find Hard parse time elapsed is taking more %. So investigate further so on and so forth.
DBA has to look for stat which is taking abnormal % of DB time.

6. Operating System Statistics - Detail:

This is the information related to OS, what is the load status on System shown here.



This report shows, system is 62 and 70% idle at time of report taken, So, there is no resource crunch at system level. But if, you found very high busy, user or sys % and indeed this will led to low idle %. Investigate what is causing this. OS Watcher is the tool which can help in this direction.
Next, very crucial part of AWR report for a DBA is SQL Statistics. Which has all sql query details executed during report time interval.


We will explore few of them, To understand, how to analyzed these reports. Let's start with

7. SQL Ordered by Elapsed Time:

As explained by name itself, this lists SQL queries ordered by Elapsed time into reported time interval.



In this report, look for query has low executions and high Elapsed time per Exec (s) and this query could be a candidate for troubleshooting or optimizations. In above report, you can see first query has maximum Elapsed time but no execution. So you have to investigate this.
In Important point, if executions is 0, it doesn't means query is not executing, this might be the case when query was still executing and you took AWR report. That's why query completion was not covered in Report.

10. SQL Statistics:





In this report, SQL queries are listed on the basis of CPU taken by the query i.e. queries causing high load on the system. The top few queries could be the candidate query for optimization.



stat, look for queries using highest CPU Times, If a query shows executions 0, this doesn't means query is not executing. It might be same case as in SQL queries ordered by Elapsed time. The query is still executing and you have taken the snapshot.

However, There are so many other stats in AWR Report which a DBA needs to consider, I have listed only ten of them but these are the most commonly used stats for any performance related information.

2 comments:

  1. Dbakeeda: Step By Step - How To Analyze Awr Report In Oracle >>>>> Download Now

    >>>>> Download Full

    Dbakeeda: Step By Step - How To Analyze Awr Report In Oracle >>>>> Download LINK

    >>>>> Download Now

    Dbakeeda: Step By Step - How To Analyze Awr Report In Oracle >>>>> Download Full

    >>>>> Download LINK LK

    ReplyDelete