It’s the middle of the night, and you get a frantic call from someone complaining that the database is slow. The caller demands to know why—and what you’re going to do about it. Sound familiar? If it does, you are not alone. High performance is a common expectation of database system users: they get very unhappy when they don’t get it, and they are usually not shy about letting you know. What should you do next? In this article, you will learn some techniques for troubleshooting Oracle Database performance issues.
To use the scripts in this article, you need to create some tables in a test schema and access some dynamic performance views. The database user SYS has all privileges to access the views, so you need the password for the SYS user. The script for setting up the example tables is available in the side.
To use the scripts in this article, you need to create some tables in a test schema and access some dynamic performance views. The database user SYS has all privileges to access the views, so you need the password for the SYS user. The script for setting up the example tables is available in the side.
Session State
Setup
Connect as SYS, and execute the following:
connect sys/<password> as sysdba create user arup identified by arup default tablespace users / alter user arup quota unlimited on users / -- now connect as arup connect arup/arup create table t1 ( col1 number, col2 varchar2(1) ) / insert into t1 values (1,’z’) / commit / |
Before you start troubleshooting why a database is slow, you have to first understand that the database itself is never slow or fast—it has a constant speed. The sessions connected to the database, however, slow down when they hit a bump in the road. To resolve a session performance issue, you need to identify the bump and remove it. Fortunately, it’s very easy to do this in most cases.
The first step to resolving a session performance issue is to ascertain what that database session is doing now. An Oracle Database session is always in one of three states.
- Idle. Not doing anything—just waiting to be given some work.
- Processing. Doing something useful—running on the CPU.
- Waiting. Waiting for something, such as a block to come from disk or a lock to be released.
If a session is waiting to be given work (idle), it’s really not slow at all—it just has nothing to do. If a session is waiting for some resource, such as a block or a lock, it has stopped processing. Until it gets that resource, the session will continue to wait. When it gets that resource, it does some processing and then moves on to the next resource it needs, waits for that to be available, and starts processing . . . and the cycle goes on until the session has nothing else to do. If it waits for resources often, the session will appear slow. But it’s not really slow—it’s just following a pattern of go, stop, go again, stop again, and so on. Your mission is to find and eliminate the “stop” issues in the session.
How difficult is it to get information about what’s causing the session to stop? It’s actually very easy: Oracle Database is instrumented to talk about what the database sessions are doing. All you need to do is to listen attentively or, more precisely, look for that information in the right place, and that place is a view called V$SESSION. Everything you need for your analysis is in this view.
To explain how to use the V$SESSION view, I will use a very common scenario—row locking—as an example. To follow along, first set up the previously mentioned tables as described in the online version of this article. Then connect as user ARUP from two different sessions. From the first session, issue the following SQL statement:
update t1 set col2 = 'x' where col1 = 1;
The output will show “1 row updated,” indicating that the row was updated. Do not issue a COMMIT after the statement. By not committing, you will force the session to get and hold a lock on the first row of the T1 table. Now go to the second session and issue the following SQL statement:
update t1 set col2 = 'y' where col1 = 1;
This statement will hang. Why? The answer is simple: the first session holds a lock on the row, which causes the second session to hang and the user to complain that the session is slow. To know what the second session is doing, the first thing you need to check is the STATE column in V$SESSION:
select sid, state from v$session where username = 'ARUP'; SID STATE ———— —————————————————— 3346 WAITING 2832 WAITED KNOWN TIME
Study the output carefully. Session 3346 (in the SID column) indicates that it is waiting for something—and therefore not working. That should be your first clue that the session is experiencing one of those performance bumps in the road. But before you can determine what the session is waiting for, let’s study the state of session 2832 in the output, which shows that it waited for some known amount of time earlier. The important point is that session 2832 is not waiting for anything right now, meaning that it’s working productively.
Next, let’s see what the second session (3346) is waiting for. That information is readily available in the EVENT column in the same V$SESSION view. The EVENT column not only shows an event a session is waiting for currently, but also shows an event a session has waited for earlier. The query against V$SESSION in Listing 1 displays information from the EVENT column for both sessions.
Code Listing 1: Query for displaying sessions, session state, and events
select sid, state, event from v$session where username = 'ARUP'; SID STATE EVENT ————— ————————————————— ———————————————————————————— 2832 WAITED KNOWN TIME SQL*Net message from client 3346 WAITING enq: TX - row lock contention
The output in Listing 1 shows that session 3346 is waiting right now for an event: “enq: TX – row lock contention”—short for “enqueue for transaction-level lock on row” or, in plain English, a row-level lock. The session is waiting because it wants to lock one or more rows, but another session has already placed locks on the row or rows. Unless that other session commits or rolls back its transaction, session 3346 will not get the lock it needs and will have no choice but to wait. On the other hand, the state of session 2832, “WAITED KNOWN TIME,” means that it is working—not waiting—right now. It was, however, waiting earlier for an event called “SQL*Net message from client” (I will discuss this specific event later). There is one very important lesson in these results: you cannot look at the EVENT column alone to find out what the session is waiting for. You must look at the STATE column first to determine whether the session is waiting or working and then inspect the EVENT column.After you determine that a session is waiting for something, the next thing you need to find out is how long the session has been waiting. A very long wait usually indicates some sort of bottleneck. Where can you get information on the length of the waiting period? The answer is right there in the V$SESSION view, in the SECONDS_IN_WAIT column.Getting the amount of time a session has been waiting makes sense for sessions that are waiting right now, but what about the sessions that are working now? Recall that the EVENT column shows not only the event a session is experiencing now but also the last wait event the session has experienced. Another column—WAIT_TIME—in the same V$SESSION view shows how long that wait lasted. (Note that WAIT_TIME is shown in centiseconds [hundredths of a second].)Now that you know how to get information on the sessions waiting and working, let’s put all the information together in a single query, shown in Listing 2. It clearly shows the state of the sessions: whether they are working or waiting; if they are working, what they were waiting for earlier and for how long; and if they are waiting, what for and for how long.Code Listing 2: Query for displaying sessions, session state, and wait detailscol "Description" format a50 select sid, decode(state, 'WAITING','Waiting', 'Working') state, decode(state, 'WAITING', 'So far '||seconds_in_wait, 'Last waited '|| wait_time/100)|| ' secs for '||event "Description" from v$session where username = 'ARUP';Output:SID STATE Description ————— —————————— ——————————————————————————————————————————————————————— 2832 Working Last waited 2029 secs for SQL*Net message from client 3346 Waiting So far 743 secs for enq: TX - row lock contention 4208 Waiting So far 5498 secs for SQL*Net message from clientIdle Event
Note the details of session 4208 in Listing 2; it’s currently waiting 5,498 seconds for a “SQL*Net message from client” event. Recall from the previous section that an Oracle Database session can be in one of the three states: working, waiting for a resource, or waiting for work. But how can a session determine whether it is idle? It will expect to be given work by clients via SQL*Net, but there is no way for it to know in advance if any work is coming from the clients. All it can do is wait for some instruction coming through SQL*Net. Until then, it will have nothing else to do but eagerly stare at the SQL*Net interface, and this condition is reported as “SQL*Net message from client” in the V$SESSION view’s EVENT column, which is practically the same thing as just being idle.You can disregard another EVENT column value, “rdbms ipc message,” because it is an event status for sessions that are idle. Note that an idle session does not show IDLE as the STATE column value; it still shows “Waiting.” You have to check the EVENT column to determine whether the session is truly idle.You may be tempted to modify the query in Listing 2 to filter sessions that include the “SQL*Net message from client” and “rdbms ipc message” idle events. Although you can do that, I highly discourage doing that, for multiple reasons. First, not all instances of the “SQL*Net message from client” event indicate that the session is idle. Consider the possibility that the network might be truly slow, in which case the session will also wait for these events. Remember, the session doesn’t have the ability to determine whether the client is truly idle or is sending instructions that are slow or stuck in the network. All it can do is wait, and it will wait with the “SQL*Net message from client” event. Second, idle events may provide some clues to Oracle Support about what else is going on inside a session. So I recommend displaying these “idle” EVENT values.Diagnosis of Locking
The output of Listing 2 provides enough information to enable you to make a diagnosis about the performance of these three sessions. Session 4208 is idle, so any complaints that session 4208 is slow just aren’t related to the database. Any performance issues related to this session could be related to a bug in the code that’s going through an infinite loop or high CPU consumption on the application server. You can redirect the performance troubleshooting focus toward the application client.The story of session 3346 is different. This session is truly a bottleneck to the application. Now that you know why this session appears slow—it is waiting for a row lock—the next logical question is which session holds that lock. The answer is also found in—I hope you guessed it—the V$SESSION view, in, more specifically, the BLOCKING_SESSION column. (Note that in an Oracle Real Application Clusters [Oracle RAC] environment, the blocking session may exist in a different instance. In such a case, the blocking instance is displayed in the V$SESSION view’s BLOCKING_INSTANCE column.)You can find out the blocking session and instance by issuing the following SQL statement:select blocking_session B_SID, blocking_instance B_Inst from v$session where sid = 3346; B_SID B_INST —————— ——————— 2832 1The output shows clearly that SID 2832 is holding the lock that SID 3346 is waiting for. Now you can follow a cause/effect relationship between the session in which an update to a row is being blocked and the session that holds the lock on that row.You can find the specific row that is locked by first finding the table containing that row. To find that table, use the same V$SESSION view; in this case, the information is in the ROW_WAIT_OBJ# column, which shows the object number of the table whose row is being locked. You can then get the name of the table from the DBA_OBJECTS view, using this object number, as shown in Listing 3.Code Listing 3: Getting row lock informationselect row_wait_obj#, row_wait_file#, row_wait_block#, row_wait_row# from v$session where sid = 3346; ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# ————————————— —————————————— ———————————————— —————————————— 241876 1024 2307623 0To get the object information:select owner, object_type, object_name, data_object_id from dba_objects where object_id = 241876; OWNER OBJECT_TYPE OBJECT_NAME DATA_OBJECT_ID ————— ———————————— ———————————— —————————————— ARUP TABLE T1 241877The output shows that some row in the T1 table is the point of the row lock contention. But which specific row is locked? That data is available in three V$SESSION view columns—ROW_WAIT_FILE#, ROW_WAIT_BLOCK#, and ROW_WAIT_ROW#—which show the relative file ID, the block ID in that file, and the row’s slot number inside that block, respectively, for that specific row. Using this information, you can identify the ROWID of the row. The ROWID, the physical address of every row in an Oracle Database instance, can be used to uniquely identify a row.Listing 4 shows a SQL script that enables you to select the specific blocking row from the table with the information gathered so far. Save this script in a file named rowinfo.sql. The script expects the input in the following order: owner, table name, object#, file#, block#, and row#. You can call this script and pass all the requested parameters by copying and pasting the corresponding output from Listing 3.Code Listing 4: Finding the row informationREM Filename: rowinfo.sql REM This shows the row from the table when the REM components of ROWID are passed. Pass the REM following in this exact order REM 1. owner REM 2. table name REM 3. data_object_id REM 4. relative file ID REM 5. block ID REM 6. row Number REM select * from &1..&2 where rowid = dbms_rowid.rowid_create ( rowid_type => 1, object_number => &3, relative_fno => &4, block_number => &5, row_number => &6 ) / SQL> @rowinfo ARUP T1 241877 1024 2307623 0 COL1 C ————— — 1 xThe output in Listing 4 shows the specific row on which a lock is being requested but that is locked by another session. So far you have identified not only the source session of the locking but the specific row being locked as well.Is it possible that the session holding the lock (SID 2832) is somehow disconnected from the client? That can occur in connection pools or when users access the database with thick-client tools such as Oracle SQL Developer. After you identify the session holding the lock, you may want to wait until it commits or rolls back the transaction. Either action releases the lock.In the case of a dead connection, you may alternatively decide to kill the session, which will force a rollback releasing the locks held by the blocking session and enabling the waiting sessions to continue. Occasionally the problem can be pretty simple: for instance, someone issued an UPDATE statement from a thick-client tool but forgot to commit and thus caused every session to wait for those updated rows. Identifying that blocking session enables you to send a gentle reminder to rectify that situation immediately.More on the Session
In many troubleshooting situations, just knowing the SID of each session is not enough. You may need to know other details, such as the client machine the session is connecting from, the user (of both the database and the operating system), and the service name. All of this information is also readily available in the same V$SESSION view you have been using. Let’s briefly examine the columns that provide that information, by running the script shown in Listing 5.Code Listing 5: Sessions from a specific userselect SID, osuser, machine, terminal, service_name, logon_time, last_call_et from v$session where username = 'ARUP'; SID OSUSER MACHINE TERMINAL SERVICE_NAME LOGON_TIME LAST_CALL_ET ————— —————— ——————— ———————— ———————————— —————————— ———————————— 3346 oradb prodb1 pts/5 SYS$USERS 05-FEB-12 6848 2832 oradb prodb1 pts/6 SERV1 05-FEB-12 7616 4408 ANANDA ANLAP ANLAP ADHOC 05-FEB-12 0OSUSER. The operating system user as which the client is connected. The output indicates that session 4408 is connected from the ANLAP machine, where a Windows user, ANANDA, has logged in.MACHINE. The name of the machine where the client is running. This could be the database server itself. For two of the sessions, the machine name shows up as “prodb1.” Session 4408 runs on a different machine—ANLAP—presumably a laptop.TERMINAL. If the session is connected from a UNIX server, this is the terminal where it runs.LOGON_TIME. This shows when the session was first connected to the Oracle Database instance.Using the columns shown in Listing 5, you can get very detailed information on a user’s sessions.Suppose you receive a complaint that the applications running on the application server named appsvr1 are experiencing performance issues. Listing 6 shows a query against the V$SESSION view—including columns you’ve used in previous queries in this article—for the sessions connected from that machine and the output.Code Listing 6: Session waits for a specific machinecol username format a5 col program format a10 col state format a10 col last_call_et head 'Called|secs ago' format 999999 col seconds_in_wait head 'Waiting|for secs' format 999999 col event format a50 select sid, username, program, decode(state, 'WAITING', 'Waiting', 'Working') state, last_call_et, seconds_in_wait, event from v$session where machine = 'appsvr1' / Called Waiting SID USERNAME PROGRAM STATE secs ago for secs EVENT ————— ——————— ——————————— ——————— ————————— ———————— —————————————————— 2832 ARUP sqlplus.exe Waiting 152 151 SQL*Net message from client 3089 ARUP sqlplus.exe Waiting 146 146 enq: TX - row lock contention 3346 ARUP sqlplus.exe Working 18 49 SQL*Net message from clientFrom the output, you can easily see that three sessions are connected from the appsvr1 application server. All of them are running SQL*Plus (as shown in the PROGRAM column). SID 3346 is the only one that is working (indicated by “Working” in the STATE column). Because it’s working, the EVENT column shows the last time the session waited. The wait time in this case is meaningless, because the session is not waiting but actually working. The “Called secs ago” column (representing the “last_call_et” column in V$SESSION) displays 18, which means that the session made a SQL call 18 seconds ago.The other sessions are waiting. SID 3089 is waiting for a row lock. From the output, you can see that the session has been waiting for 146 seconds and that it also made its last SQL call 146 seconds ago. This indicates that the session has been waiting for that particular lock ever since it made that SQL call.Finally, session 2832 is also waiting; in this case, it is waiting with a “SQL*Net message from client” event, which means it is idle, waiting to be given some work. The session issued its last SQL statement 152 seconds ago and has been idle for 151 seconds.Armed with this information, you can diagnose performance issues very accurately. You can tell the complaining user that of the three sessions connected from the appsvr1 application server, one session is idle, one is working, and one is waiting for a lock. The user is probably referring to the slowness of this last session. Now you know the reason and how you can rectify it.Getting the SQL
Another key piece of performance tuning information is the SQL statement a session is executing, which will provide more insights into the workings of the session. The same V$SESSION view also shows the SQL statement information. The SQL_ID column in the V$SESSION view shows the ID of the last SQL statement executed. You can get the text of that SQL statement from the V$SQL view, using the SQL_ID value. Here is an example of how I have identified the SQL statement executed by the session that appears slow to the user.select sql_id from v$session where sid = 3089; SQL_ID ————————————————— g0uubmuvk4uax set long 99999 select sql_fulltext from v$sql where sql_id = 'g0uubmuvk4uax'; SQL_FULLTEXT ———————————————————————————————————————— update t1 set col2 = 'y' where col1 = 1Data Access Issues
I have used row-level locking as the cause of a slowdown in this article. Although locking-related contention is a very common cause, it is not the only cause of performance problems. Another major cause of contention is disk I/O. When a session retrieves data from the database data files on disk to the buffer cache, it has to wait until the disk sends the data. This wait shows up for that session as “db file sequential read” (for index scans) or “db file scattered read” (for full-table scans) in the EVENT column, as shown below:select event from v$session where sid = 3011; EVENT ————————————————————————— db file sequential readWhen you see this event, you know that the session is waiting for I/O from the disk to complete. To make the session go faster, you have to reduce that waiting period. There are several ways to reduce the wait:
- Reduce the number of blocks retrieved by the SQL statement. Examine the SQL statement to see if it is doing a full-table scan when it should be using an index, if it is using a wrong index, or if it can be rewritten to reduce the amount of data it retrieves.
- Place the tables used in the SQL statement on a faster part of the disk.
- Consider increasing the buffer cache to see if the expanded size will accommodate the additional blocks, therefore reducing the I/O and the wait.
- Tune the I/O subsystem to return data faster.
There are other options as well, but the preceding ones are the most common remediation techniques. The exact activity you undertake depends on your specific situation, but the first technique—reducing the number of blocks retrieved by a SQL statement—almost always works.
When you think about tuning to reduce the number of blocks, you can look at the SQL statement to see which table is being selected from. But what if you see two or more tables in the statement? How do you determine which table is causing the wait?
To find the table causing a wait, you will again use the V$SESSION view. The view’s P1 and P2 columns provide information about the segment the session is waiting for. Listing 7 shows a query of P1 and P2, and the output.
Code Listing 7: Checking data access waits
select SID, state, event, p1, p2 from v$session where username = 'ARUP'; SID STATE EVENT P1 P2 ———— ——————— ——————————————————————— —— ———— 2201 WAITING db file sequential read 5 3011
The P1 column shows the file ID, and the P2 column shows the block ID. From that information in the result in Listing 7, you can get the segment name from the extent information in DBA_EXTENTS, as shown below:
select owner, segment_name from dba_extents where file_id = 5 and 3011 between block_id and block_id + blocks; OWNER SEGMENT_NAME —————— ————————————— ARUP T1
This shows that the T1 table, owned by ARUP, is being selected from by the disk in the session. You should direct your attention to this table for tuning. You can move the table to a high-speed disk for faster I/O, or, alternatively, you can focus on making I/O in this table faster by making changes that affect this table, such as creating new indexes, creating materialized views, or building a result cache.
No comments:
Post a Comment