Translate into your own language

Showing posts with label RAC. Show all posts
Showing posts with label RAC. Show all posts

Monday, August 1, 2016

CRSCTL commands in Oracle 11g Release 2

How to shutdown CRS on all nodes and Disable CRS as ROOT user:
-------------------------------------------------------------
#crsctl stop crs
#crsctl disable crs

How to Enable CRS and restart CRS on all nodes as ROOT user:
-----------------------------------------------------------
#crsctl enable crs
#crsctl start crs

How to check VIP status is ONLINE / OFFLINE:
----------------------------------------
$crs_stat or
$crsctl stat res -t ------> 11gr2

How to Check current Version of Clusterware:
-------------------------------------------
$crsctl query crs activeversion

$crsctl query crs softwareversion [node_name]

How to Start & Stop CRS and CSS:
-------------------------------
$crsctl start crs
$crsctl stop crs

#/etc/init.d/init.crs start
#/etc/init.d/init.crs stop

#/etc/init.d/init.cssd stop
#/etc/init.d/init.cssd start

How to Enable & Disable CRS:
---------------------------
$crsctl enable crs
$crsctl disable crs

#/etc/init.d/init.crs enable
#/etc/init.d/init.crs disable

How to Check current status of CRS:
----------------------------------
$crsctl check crs

$crsctl check cluster [-node node_name]

How to Check CSS, CRS and EVMD:
------------------------------
$crsctl check cssd

$crsctl check crsd

$crsctl check evmd

How to List the Voting disks currently used by CSS:
--------------------------------------------------
$crsctl check css votedisk

$crsctl query css votedisk

How to Add and Delete any voting disk:
-------------------------------------
$crsctl add css votedisk <PATH>

$crsctl delete css votedisk <PATH>

How to start clusterware resources:
----------------------------------
$crsctl start resources

$crsctl stop resources

Oracle RAC Load balancing and Failover

Oracle RAC Load balancing and Failover
LOAD BALANCING in RAC:-
The Oracle RAC system can distribute the load over many nodes this feature called as load balancing.

There are two methods of load balancing
1.Client load balancing
2.Server load balancing

1.Client Load Balancing
Client Load Balancing distributes new connections among Oracle RAC nodes so that no one server is overloaded with connection requests and it is configured at net service name level by providing multiple descriptions in a description list or multiple addresses in an address list. For example, if connection fails over to another node in case of failure, the client load balancing ensures that the redirected connections are distributed among the other nodes in the RAC.

Configure Client-side connect-time load balancing by setting LOAD_BALANCE=ON in the corresponding client side TNS entry.

TESTRAC =
(DESCRIPTION =
(ADDRESS_LIST=
(LOAD_BALANCE = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
)
(CONNECT_DATA = (SERVICE_NAME = testdb.selectstarfrom.com))
)

2.Server Load Balancing
Server Load Balancing distributes processing workload among Oracle RAC nodes. It divides the connection load evenly between all available listeners and distributes new user session connection requests to the least loaded listener(s) based on the total number of sessions which are already connected. Each listener communicates with the other listener(s) via each database instance’s PMON process.

Configure Server-side connect-time load balancing feature by setting REMOTE_LISTENERS initialization parameter of each instance to a TNS name that describes list of all available listeners.

TESTRAC_LISTENERS =
(DESCRIPTION =
(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1)(PORT = 1521)))
(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2)(PORT = 1521))))
)

Set *.remote_listener= TESTRAC_LISTENERS’ initialization parameter in the database’s shared SPFILE and add TESTRAC_LISTENERS’ entry to the TNSNAMES.ORA file in the Oracle Home of each node in the cluster.

Once you configure Server-side connect-time load balancing, each database’s PMON process will automatically register the database with the database’s local listener as well as cross-register the database with the listeners on all other nodes in the cluster. Now the nodes themselves decide which node is least busy, and then will connect the client to that node.

-------------------------------------------------------------------------------------------------------------------------------------------------------------

FAILOVER in RAC:-
The Oracle RAC system can protect against failures caused by O/S or server crashes or hardware failures. When a node failure occurs in RAC system, the connection attempts can fail over to other surviving nodes in the cluster this feature called as Failover.

There are two methods of failover
1. Connection Failover
2. Transparent Application Failover (TAF)

1. Connection Failover
If a connection failure occurs at connect time, the application failover the connection to another active node in the cluster. This feature enables client to connect to another listener if the initial connection to the first listener fails.

Enable client-side connect-time Failover by setting FAILOVER=ON in the corresponding client side TNS entry.

TESTRAC =
(DESCRIPTION =
(ADDRESS_LIST=
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
)
(CONNECT_DATA = (SERVICE_NAME = testdb.selectstarfrom.com))
)

If LOAD_BALANCE is set to on then clients randomly attempt connections to any nodes. If client made connection attempt to a down node, the client needs to wait until it receives the information that the node is not accessible before trying alternate address in ADDRESS_LIST.

2. Transparent Application Failover (TAF)
If connection failure occurs after a connection is established, the connection fails over to other surviving nodes. Any uncommitted transactions are rolled back and server side program variables and session properties will be lost. In some case the select statements automatically re-executed on the new connection with the cursor positioned on the row on which it was positioned prior to the failover.

TESTRAC =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.selectstarfrom.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 180)(DELAY = 5))
)
)


Saturday, June 18, 2016

Oracle 11gR2 Clusterware Startup Sequence

In Oracle 10g RAC and 11gR1 RAC,  Oracle clusterware and ASM are installed in the different Oracle homes, and the Clusterware has to be  up before ASM instance can be started because ASM instance uses the clusterware to access the shared storage.  Oracle 11g R2 introduced the  grid infrastructure home which combines Oracle clusterware and ASM.  The OCR and votingdisk of 11g R2 clusterware can be stored in ASM.  So it seems that ASM needs the clusterware up first to access the shared storage  and the clusterware needs ASM up first before it can access its key data structure: OCR and votingdisk.  So really clusterware and ASM, which one needs to be up first, and which one has to wait for other? This seemed to be the chicken or the egg problem.

Oracle’s solution to this problem is to combines  the clusterware and ASM  into a single Grid Infrastructure home and  comes up a  procedure with  a complex  start up sequence which  mixes  the different components of clusterware and ASM  instance in order.  Oracle Metalink note 11gR2 Clusterware and Grid Home –The full description, the really unreadable diagram and/or any updates to this you can find it in MOS Document 1053147.1.



This description is also from the MOS note stated above

Short summary of the startup sequence: INIT spawns init.ohasd (with respawn) which in turn starts the OHASD process (Oracle High Availability Services Daemon).  This daemon spawns 4 processes.

Level 1: OHASD Spawns:

    cssdagent         - Agent responsible for spawning CSSD.
    orarootagent     - Agent responsible for managing all root owned ohasd resources.
    oraagent         - Agent responsible for managing all oracle owned ohasd resources.
    cssdmonitor        - Monitors CSSD and node health (along wth the cssdagent).
 

Level 2: OHASD rootagent spawns:

    CSDD (ora.cssd)     - Cluster Synchronization Services
    CRSD(ora.crsd)     - Primary daemon responsible for managing cluster resources.
    CTSSD(ora.ctssd)     - Cluster Time Synchronization Services Daemon
    Diskmon(ora.diskmon)
    ACFS (ASM Cluster File System) Drivers

Level 2: OHASD oraagent spawns:

    MDNSD(ora.mdnsd)     - Used for DNS lookup
    GIPCD(ora.gipcd)     - Used for inter-process and inter-node communication
    GPNPD(ora.gpnpd)     - Grid Plug & Play Profile Daemon
    EVMD(ora.evmd)     - Event Monitor Daemon
    ASM(ora.asm)     - Resource for monitoring ASM instances

Level 3: CRSD spawns:

    orarootagent     - Agent responsible for managing all root owned crsd resources.
    oraagent         - Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:

    Network resource     - To monitor the public network
    SCAN VIP(s)     - Single Client Access Name Virtual IPs
    Node VIPs         - One per node
    ACFS Registery     - For mounting ASM Cluster File System
    GNS VIP (optional)     - VIP for GNS

Level 4: CRSD oraagent spawns:

    ASM Resouce     - ASM Instance(s) resource
    Diskgroup         - Used for managing/monitoring ASM diskgroups.
    DB Resource     - Used for monitoring and managing the DB and instances
    SCAN Listener     - Listener for single client access name, listening on SCAN VIP
    Listener         - Node listener listening on the Node VIP
    Services         - Used for monitoring and managing services
    ONS         - Oracle Notification Service
    eONS         - Enhanced Oracle Notification Service
    GSD         - For 9i backward compatibility
    GNS (optional)     - Grid Naming Service - Performs name resolution

Tuesday, May 24, 2016

What is Virtual IP and how it works in Oracle RAC

In real application Cluster environment set up following are the IPs required per Node.

1. Private IP: This IP is used for Node interconnection. Systems can't be access using this IP from outer world.

2. Public IP: This IP is to used for accessing system for day to day tasks monitoring etc.

3. Virtual IP (VIP): This IP is required for fail over in case of Node is down. This will move to surviving node.

In Database Management Software Oracle 11g R2, One more concept is introduced Single Client Access Name (SCAN). Whole Real Application Cluster has given a name called SCAN name. This
is basically a name corresponds to minimum one or recommended three IP's. These IP's are called SCAN VIP's.

Let's see how VIP works 

Suppose, we have a two node Real Application Cluster set up with following IP's

NODE              Static IP address                     Virtual IP address
=======================================================================
racnode1            192.168.1.100                         192.168.1.200
                               (racnode1)                           (racnode1_vip1)

racnode2           192.168.1.101                          192.168.1.201
                              (racnode2)                            (racnode2_vip2)

In Database Management Software Oracle 10g:

Let's first see how this works in Oracle 10g. Suppose, Listener.ora of both Database is using Static IP for it's configuration like

LISTENER=
  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=racnode1)(PORT=1521))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=sales.us.example.com)
      (ORACLE_HOME=/oracle10g)
      (SID_NAME=Service1))
    (SID_DESC=
      (SID_NAME=plsextproc)
      (ORACLE_HOME=/oracle10g)
      (PROGRAM=extproc)))

Hence, Tnsnames.ora for Client system will be like

Service1 =
(DESCRIPTION =
  (ADDRESS=(PROTOCOL=TCP)(HOST=racnode1)(PORT=1521))
  (ADDRESS=(PROTOCOL=TCP)(HOST=racnode2)(PORT=1521))
    (CONNECT_DATA =
      (SERVICE_NAME = Service1)
     )
  )

Now, A new connection to database will first go to racnode1, if this node is alive and working fine connection will be establish and user can continue work.

What if, racnode1 is not available Even in this case, client tries to establish a connection with the racnode1 Because, it is first in its address list.But since the node(racnode1) is not available, client tries to establish it’s connection with the next available address in the list (i.e racnode2). So, there is a delay to move from one node to other. This is called Connect-Time Failover.

But the Problem is that the TIME (TCP TIMEOUT) it takes to failover, which will be ranging between a few seconds to a few minutes. For a very high critical systems/environments this is not acceptable.

To resolve this problem Oracle introduce Virtual IP (VIP).

Let's see how it works with VIP

Now, Listener.ora of both Database is using VIP for it's configuration like

LISTENER=
  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=racnode1_vip1)(PORT=1521))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=sales.us.example.com)
      (ORACLE_HOME=/oracle10g)
      (SID_NAME=Service1))
    (SID_DESC=
      (SID_NAME=plsextproc)
      (ORACLE_HOME=/oracle10g)
      (PROGRAM=extproc)))

Hence, Tnsnames.ora for Client system will be like

Service1 =
(DESCRIPTION =
  (ADDRESS=(PROTOCOL=TCP)(HOST=racnode1_vip1)(PORT=1521))
  (ADDRESS=(PROTOCOL=TCP)(HOST=racnode1_vip2)(PORT=1521))
    (CONNECT_DATA =
      (SERVICE_NAME = Service1)
     )
  )

Now, A new connection to database will first go to racnode1_vip1, if this node is alive and working fine connection will be establish and user can continue work.

What if, racnode1_vip1 is not available Even in this case, client tries to establish a connection with the racnode1_vip1 Because, it is first in its address list. But since the node(racnode1_vip1) is not available, CRS will come in to picture and move the failed node’s VIP to one of the surviving nodes of the cluster.

 Any connection attempts to the failed node by using VIP will be handled by the failed node’s VIP that is currently residing on one of the surviving node.

This (failed node’s VIP) will respond immediately to client by sending an error indicating that there is no listener. Upon receiving the information of no listener,client immediately retry connection using the next IP in the address list. Thus reduces the time to failover.
In Database Management Softwere Oracle 11g2:

When we talk about Oracle 11g, Since, we have SCAN VIP's in Oracle 11g, Following question comes into mind 
 
Do we still need VIP in Oracle 11g ?

 Yes, We still need VIP. VIP still play the same role as it is discussed in case of Database Management Softwere Oracle 10g.

 What is the Difference between SCAN VIP and VIP ?

The IP address corresponding to SCAN NAME are called as SCAN VIP. Which runs on DB nodes as SCAN LISTENERS.

 Let's see how VIP's works in 11g R2.  In Oracle 11g R2, tnsnames.ora will have only one entry that is scan name of the Cluster like.

Service1 =
(DESCRIPTION =
  (ADDRESS=(PROTOCOL=TCP)(HOST=scan_racnode1_vip1)(PORT=1521))
    (CONNECT_DATA =
      (SERVICE_NAME = Service1)
     )
  )

This scan name is resolved by any of the SCAN VIP and every SCAN VIP has a Listener associated with it running on node know as SCAN LISTENER. In below example, There are two SCAN LISTENER's running on odain1 and odain2.

[grid@bin]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node odain1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node odain2

All databases are registered with each SCAN LISTENER in the Cluster and PMON updates it's load to each SCAN LISTENER. Each request go through using SCAN_NAME, resolves to SCAN VIP i.e. SCAN LISTENER. Now, SCAN LISTENER redirects it to VIP by deciding using Load Balance.

SCAN _NAME ===============> SCAN VIP ==============> VIP

Monday, May 2, 2016

Oracle DBA Interview Questions and Answers - RAC

What is RAC?

RAC stands for Real Application cluster.

It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover, media failover features.

Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all the business applications.

Oracle RAC provides the foundation for enterprise grid computing.

Why do we have to create odd number of voting disk?

As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures.
Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.

Does the cluster actually check for the vote count before node eviction? If yes, could you expain this process briefly?

Yes. If you lose half or more of all of your voting disks, then nodes get evicted from the cluster, or nodes kick themselves out of the cluster


How does OCSSD starts first if voting disk & OCR resides in ASM Diskgroups?

You might wonder how CSSD, which is required to start the clustered ASM instance, can be started if voting disks are stored in ASM?

This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance.
To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does not require the ASM instance to be up.
Once the voting disks are located, CSS can access them and joins the cluster.

What is gsdctl in RAC? list gsdctl commands in Oracle RAC?

GSDCTL stands for Global Service Daemon Control, we can use gsdctl commands to start, stop, and obtain the status of the GSD service on any platform.

The options for gsdctl are:-
$ gsdctl start -- To start the GSD service
$ gsdctl stop  -- To stop the GSD service
$ gsdctl stat  -- To obtain the status of the GSD service

Log file location for gsdctl:
$ ORACLE_HOME/srvm/log/gsdaemon_node_name.log

What is Oracle RAC One Node?

Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd node is in cold standby mode. If the instance fails for some reason then RAC one node detect it and restart the instance on the same node or the instance is relocate to the 2nd node incase there is failure or fault in 1st node. The benefit of this feature is that it provides a cold failover solution and it automates the instance relocation without any downtime and does not need a manual intervention. Oracle introduced this feature with the release of 11gR2 (available with Enterprise Edition).

What is RAC and how is it different from non RAC databases?

Oracle Real Application clusters allows multiple instances to access a single database, the instances will be running on multiple nodes.
In Real Application Clusters environments, all nodes concurrently execute transactions against the same database.
Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.

What are the advantages of RAC (Real Application Clusters)?

Reliability - if one node fails, the database won't fail
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases

What is Cache Fusion?

Oracle RAC is composed of two or more instances. When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process manages the cache fusion.

What command would you use to check the availability of the RAC system?

crs_stat -t -v (-t -v are optional)

How do we verify that RAC instances are running?

SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename under INST_NAME column.

How can you connect to a specific node in a RAC environment?

tnsnames.ora ensure that you have INSTANCE_NAME specified in it.

Which is the "MASTER NODE" in RAC?

The node with the lowest node number will become master node and dynamic remastering of the resources will take place.

To find out the master node for particular resource, you can query v$ges_resource for MASTER_NODE column.

To find out which is the master node, you can see ocssd.log file and search for "master node number".
when the first master node fails in the cluster the lowest node number will become master node.

What components in RAC must reside in shared storage?

All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.

Give few examples for solutions that support cluster storage?

·ASM (automatic storage management),
·Raw disk devices,
·Network file system (NFS),
·OCFS2 and
·OCFS (Oracle Cluster Fie systems).

What are Oracle Cluster Components?

1.Cluster Interconnect (HAIP)
2.Shared Storage (OCR/Voting Disk)
3.Clusterware software
4.Oracle Kernel Components

What are Oracle RAC Components?

VIP, Node apps etc.

What are Oracle Kernel Components?

Basically Oracle kernel need to switched on with RAC On option when you convert to RAC, that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.

How to turn on RAC?

# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle

Disk architechture in RAC?

SAN (Storage Area Networks) - generally using fibre to connect to the SAN
NAS (Network Attached Storage) - generally using a network to connect to the NAS using either NFS, ISCSI

What is Oracle Clusterware?

The Clusterware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server.
The software is run by the Cluster Ready Services (CRS) using the Oracle Cluster Registry (OCR) that records and maintains the cluster and node membership information and the voting disk which acts as a tiebreaker during communication failures. Consistent heartbeat information travels across the interconnect to the voting disk when the cluster is running.

Real Application Clusters
Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all your business applications. Oracle RAC provides the foundation for enterprise grid computing.

Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a single database across a cluster of servers, providing fault tolerance from hardware failures or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of capability in terms of availability, scalability, and low-cost computing.

One DB opened by multipe instances so the the db ll be Highly Available if an instance crashes.
Cluster Software. Oracles Clusterware or products like Veritas Volume Manager are required to provide the cluster support and allow each node to know which nodes belong to the cluster and are available and with Oracle Cluterware to know which nodes have failed and to eject then from the cluster, so that errors on that node can be cleared.

Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.

The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.

The voting disk is used to determine if a node has failed, i.e. become separated from the majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted and will after the reboot add itself again the the surviving cluster nodes.

What are the Oracle Clusterware key components?

Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.

What is Voting Disk and OCR?

Voting Disk:
Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
A node must be able to access more than half of the voting disks at any time.
For example, if you have 3 voting disks configured, then a node must be able to access at least two of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster.

Oracle Cluster Registry (OCR) 
The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.
The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.

What are the administrative tasks involved with voting disk?

Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks

Can you add voting disk online? Do you need voting disk backup?

Yes,  as per documentation, if you have multiple voting disk you can add online, but if you have only one voting disk , by that cluster will be down as its lost you just need to start crs in exclusive mode and add the votedisk using
crsctl add votedisk <path>

What is the Oracle Recommendation for backing up voting disk?

Oracle recommends us to use the dd command to backup the voting disk with a minimum block size of 4KB.

How do we backup voting disks?

1) Oracle recommends that you back up your voting disk after the initial cluster creation and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all nodes. Then, determine the current voting disk by issuing the following command:
crsctl query votedisk css
3) Then, issue the dd or ocopy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
where,
voting_disk_name is the name of the active voting disk
backup_file_name is the name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
copy voting_disk_name backup_file_name

How do we verify an existing current backup of OCR?

We can verify the current backup of OCR using the following command : ocrconfig -showbackup

You have lost OCR disk, what is your next step?

The cluster stack will be down due to the fact that cssd is unable to maintain the integrity, this is true in 10g, From 11gR2 onwards, the crsd stack will be down, the hasd still up and running. You can add the ocr back by restoring the automatic backup or import the manual backup,

What are the major RAC wait events?

In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report? 

This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

How do you troubleshoot node reboot?

Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

What are Oracle Clusterware processes for 10g on Unix and Linux?

Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

What are Oracle database background processes specific to RAC?

Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

What is GRD?

GRD stands for Global Resource Directory. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory.This process is referred to as cache fusion and helps in data integrity.

What is ACMS?

ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory update(ie)SGA updates are globally committed on success or globally aborted in event of a failure.

What is SCAN listener?

A scan listener is something that additional to node listener which listens the incoming db connection requests from the client which got through the scan IP, it got end points configured to node listener where it routes the db connection requests to particular node listener.

SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the concurrent manager (kind of job scheduler in oracle apps).
Steps to disable the SCAN IP,
i.  Do not use SCAN IP at the client end.
ii. Stop scan listener
    srvctl stop scan_listener
iii.Stop scan
    srvctl stop scan (this will stop the scan vip's)
iv. Disable scan and disable scan listener
    srvctl disable scan

What are the different network components are in 10g RAC?

public, private, and vip components
Private interfaces is for intra node communication.
VIP is all about availability of application. When a node fails then the VIP component fail over to some other node, this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list

What is an interconnect network?

An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.

What is the use of cluster interconnect?
Cluster interconnect is used by the Cache fusion for inter instance communication.

How can we configure the cluster interconnect?

· Configure User Datagram Protocol (UDP) on Gigabit Ethernet for cluster interconnects.
· On UNIX and Linux systems we use UDP and RDS (Reliable data socket) protocols to be used by Oracle Clusterware.
· Windows clusters use the TCP protocol.

What is the purpose of Private Interconnect?

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

What is a virtual IP address or VIP?

A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

What is the use of VIP?

If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

Why do we have a Virtual IP (VIP) in Oracle RAC?

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.

When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

Give situations under which VIP address failover happens?

VIP addresses failover happens when the node on which the VIP address runs fails; all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.

What is the significance of VIP address failover?

When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

What is the use of a service in Oracle RAC environment?

Applications should use the services feature to connect to the Oracle database. Services enable us to define rules and characteristics to control how users and applications connect to database instances.

What are the characteristics controlled by Oracle services feature?

The characteristics include a unique name, workload balancing, failover options, and high availability.

What enables the load balancing of applications in RAC?

Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

What are the types of connection load-balancing?

Connection Workload management is one of the key aspects when you have RAC instances as you want to distribute the connections to specific nodes/instance or those have less load.
There are two types of connection load-balancing:
1.Client Side load balancing (also called as connect time load balancing)
2.Server side load balancing (also called as Listener connection load balancing)

What is the difference between server-side and client-side connection load balancing?

Client-side balancing happens at client side where load balancing is done using listener.In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service.

Client Side load balancing:- Oracle client side load balancing feature enables clients to randomize the connection requests among all the available listeners based on their load.

An tns entry that contains all nodes entries and use load_balance=on (default its on) will use the connect time load balancing or client side load balancing.

Sample Client Side TNS Entry:-

    finance =
    (DESCRIPTION =
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac2-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac1-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac3-vip)(PORT = 2042))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
         (SERVER = DEDICATED)
         (SERVICE_NAME = FINANCE) (FAILOVER=ON)
    (FAILOVER_MODE =  (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))
    )
    )

Server side load balancing:- This improves the connection performance by balancing the number of active connections among multiple instances and dispatchers. In a single instance environment (shared servers), the listener selects the least dispatcher to handle the incoming client requests. In a rac environments, PMON is aware of all instances load and dispatchers , and depending on the load information PMON redirects the connection to the least loaded node.

In a RAC environment, *.remote_listener parameter which is a tns entry containing all nodes addresses need to set to enable the load balance advisory updates to PMON.

Sample Tns entry should be in an instances of RAC cluster,

    local_listener=LISTENER_MYRAC1
    remote_listener = LISTENERS_MYRACDB

What are the administrative tools used for Oracle RAC environments?

Oracle RAC cluster can be administered as a single image using the below
·       OEM (Enterprise Manager),
·       SQL*PLUS,
·       Server control (SRVCTL),
·       Cluster Verification Utility (CLUVFY),
·       DBCA,
·       NETCA

Name some Oracle Clusterware tools and their uses?

·OIFCFG - allocating and deallocating network interfaces.
·OCRCONFIG - Command-line tool for managing Oracle Cluster Registry.
·OCRDUMP - Identify the interconnect being used.
·CVU - Cluster verification utility to get status of CRS resources.

What is the difference between CRSCTL and SRVCTL?

crsctl manages clusterware-related operations:
    Starting and stopping Oracle Clusterware
    Enabling and disabling Oracle Clusterware daemons
    Registering cluster resources

srvctl manages Oracle resource–related operations:
    Starting and stopping database instances and services
    Also from 11gR2 manages the cluster resources like network,vip,disks etc

How do we remove ASM from a Oracle RAC environment?

We need to stop and delete the instance in the node first in interactive or silent mode.After that asm can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name

How do we verify that an instance has been removed from OCR after deleting an instance?

Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat

What are the modes of deleting instances from ORacle Real Application cluster Databases?

We can delete instances using silent mode or interactive mode using DBCA(Database Configuration Assistant).

What are the background process that exists in 11gr2 and functionality?

Process Name     Functionality
crsd     •The CRS daemon (crsd) manages cluster resources based on configuration information that is stored in Oracle Cluster Registry (OCR) for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes.
cssd     •Cluster Synchronization Service (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interfaces with your clusterware to manage node membership information. CSS has three separate processes: the CSS daemon (ocssd), the CSS Agent (cssdagent), and the CSS Monitor (cssdmonitor). The cssdagent process monitors the cluster and provides input/output fencing. This service formerly was provided by Oracle Process Monitor daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.
diskmon     •Disk Monitor daemon (diskmon): Monitors and performs input/output fencing for Oracle Exadata Storage Server. As Exadata storage can be added to any Oracle RAC node at any point in time, the diskmon daemon is always started when ocssd is started.
evmd     •Event Manager (EVM): Is a background process that publishes Oracle Clusterware events
mdnsd     •Multicast domain name service (mDNS): Allows DNS requests. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
gnsd     •Oracle Grid Naming Service (GNS): Is a gateway between the cluster mDNS and external DNS servers. The GNS process performs name resolution within the cluster.
ons     •Oracle Notification Service (ONS): Is a publish-and-subscribe service for communicating Fast Application Notification (FAN) events
oraagent     •oraagent: Extends clusterware to support Oracle-specific requirements and complex resources. It runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).
orarootagent     •Oracle root agent (orarootagent): Is a specialized oraagent process that helps CRSD manage resources owned by root, such as the network, and the Grid virtual IP address
oclskd     •Cluster kill daemon (oclskd): Handles instance/node evictions requests that have been escalated to CSS
gipcd     •Grid IPC daemon (gipcd): Is a helper daemon for the communications infrastructure
ctssd     •Cluster time synchronisation daemon(ctssd) to manage the time syncrhonization between nodes, rather depending on NTP

Under which user or owner the process will start?

Component                     Name of the Process         Owner
Oracle High Availability Service         ohasd                 init, root
Cluster Ready Service (CRS)             Cluster Ready Services         root
Cluster Synchronization Service (CSS)         ocssd,cssd monitor, cssdagent     grid owner
Event Manager (EVM)                 evmd, evmlogger         grid owner
Cluster Time Synchronization Service (CTSS)     octssd                 root
Oracle Notification Service (ONS)         ons, eons             grid owner
Oracle Agent                     oragent             grid owner
Oracle Root Agent                 orarootagent             root
Grid Naming Service (GNS)             gnsd                 root
Grid Plug and Play (GPnP)             gpnpd                 grid owner
Multicast domain name service (mDNS)         mdnsd                 grid owner

What is the major difference between 10g and 11g RAC?

There is not much difference between 10g and 11gR (1) RAC. But there is a significant difference in 11gR2.

Prior to 11gR1(10g) RAC, the following were managed by Oracle CRS
    Databases
    Instances
    Applications
    Node Monitoring
    Event Services
    High Availability

From 11gR2(onwards) its completed HA stack managing and providing the following resources as like the other cluster software like VCS etc.
    Databases
    Instances
    Applications
    Cluster Management
    Node Management
    Event Services
    High Availability
    Network Management (provides DNS/GNS/MDNSD services on behalf of other traditional services) and SCAN – Single Access Client Naming method, HAIP
    Storage Management (with help of ASM and other new ACFS filesystem)
    Time synchronization (rather depending upon traditional NTP)
    Removed OS dependent hang checker etc, manages with own additional monitor process

What is hangcheck timer? 

The hangcheck timer checks regularly the health of the system. If the system hangs or stop the node will be restarted automatically.
There are 2 key parameters for this module:
-> hangcheck-tick: this parameter defines the period of time between checks of system health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.
-> hangcheck-margin: this defines the maximum hang delay that should be tolerated before hangcheck-timer resets the RAC node.

State the initialization parameters that must have same value for every instance in an Oracle RAC database?

Some initialization parameters are critical at the database creation time and must have same values.Their value must be specified in SPFILE or PFILE for every instance.The list of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT

What is RAC? What is the benefit of RAC over single instance database?

In Real Application Clusters environments, all nodes concurrently execute transactions against the same database. Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.
Benefits:
Improve response time
Improve throughput
High availability
Transparency


Advantages of RAC (Real Application Clusters)

Reliability - if one node fails, the database won't fail
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases


What is a virtual IP address or VIP?

A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

What is the use of VIP?

If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.
Give situations under which VIP address failover happens:-
VIP addresses failover happens when the node on which the VIP address runs fails, all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.
Using virtual IP we can save our TCP/IP timeout problem because Oracle notification service maintains communication between each nodes and listeners.

What is the significance of VIP address failover?

When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

What is voting disk?

Voting Disk is a file that sits in the shared storage area and must be accessible by all nodes in the cluster. All nodes in the cluster registers their heart-beat information in the voting disk, so as to confirm that they are all operational. If heart-beat information of any node in the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart beat of all nodes to the voting disk. When any node is not able to send heartbeat to voting disk, then it will reboot itself, thus help avoiding the split-brain syndrome.

For high availability, Oracle recommends that you have a minimum of three or odd number (3 or greater) of votingdisks.

Voting Disk - is file that resides on shared storage and Manages cluster members.  Voting disk reassigns cluster ownership between the nodes in case of failure.

The Voting Disk Files are used by Oracle Clusterware to determine which nodes are currently members of the cluster. The voting disk files are also used in concert with other Cluster components such as CRS to maintain the clusters integrity.

Oracle Database 11g Release 2 provides the ability to store the voting disks in ASM along with the OCR. Oracle Clusterware can access the OCR and the voting disks present in ASM even if the ASM instance is down. As a result CSS can continue to maintain the Oracle cluster even if the ASM instance has failed.

How many voting disks are you maintaining ?

http://www.toadworld.com/KNOWLEDGE/KnowledgeXpertforOracle/tabid/648/TopicID/RACR2ARC6/Default.aspx

By default Oracle will create 3 voting disk files in ASM.

Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You should always configure an odd number of voting disks >= 3. This is because loss of more than half your voting disks will cause the entire cluster to fail.

You should plan on allocating 280MB for each voting disk file. For example, if you are using ASM and external redundancy then you will need to allocate 280MB of disk for the voting disk. If you are using ASM and normal redundancy you will need 560MB.

Why we need to keep odd number of voting disks ?

Oracle expects that you will configure at least 3 voting disks for redundancy purposes. You should always configure an odd number of voting disks >= 3. This is because loss of more than half your voting disks will cause the entire cluster to fail.


What are Oracle RAC software components?

Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor

What is TAF?

TAF (Transparent Application Failover) is a configuration that allows session fail-over between different nodes of a RAC database cluster.
Transparent Application Failover (TAF). If a communication link failure occurs after a connection is established, the connection fails over to another active node. Any disrupted transactions are rolled back, and session properties and server-side program variables are lost. In some cases, if the statement executing at the time of the failover is a Select statement, that statement may be automatically re-executed on the new connection with the cursor positioned on the row on which it was positioned prior to the failover.

After an Oracle RAC node crashes—usually from a hardware failure—all new application transactions are automatically rerouted to a specified backup node. The challenge in rerouting is to not lose transactions that were "in flight" at the exact moment of the crash. One of the requirements of continuous availability is the ability to restart in-flight application transactions, allowing a failed node to resume processing on another server without interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed Transparent Application Failover. TAF allows the DBA to configure the type and method of failover for each Oracle Net client.
TAF architecture offers the ability to restart transactions at either the transaction (SELECT) or session level.

What are the requirements for Oracle Clusterware?

1. External Shared Disk to store Oracle Cluster ware file (Voting Disk and Oracle Cluster Registry - OCR)
2. Two netwrok cards on each cluster ware node (and three set of IP address) -
Network Card 1 (with IP address set 1) for public network
Network Card 2 (with IP address set 2) for private network (for inter node communication between rac nodes used by clusterware and rac database)
IP address set 3 for Virtual IP (VIP) (used as Virtual IP address for client connection and for connection failover)
3. Storage Option for OCR and Voting Disk - RAW, OCFS2 (Oracle Cluster File System), NFS, …..
Which enable the  load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

How to find location of OCR file when CRS is down?

If you need to find the location of OCR (Oracle Cluster Registry) but your CRS is down.
When the CRS is down:
Look into “ocr.loc” file, location of this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc
When CRS is UP:
Set ASM environment or CRS environment then run the below command:
ocrcheck

In 2 node RAC, how many NIC’s are r using ?

2 network cards on each clusterware node
Network Card 1 (with IP address set 1) for public network
Network Card 2 (with IP address set 2) for private network (for inter node communication between rac nodes used by clusterware and rac database)

In 2 node RAC, how many IP’s are r using ?

6 - 3 set of IP address
## eth1-Public:  2
## eth0-Private: 2
## VIP: 2

How to find IP’s information in RAC ?

Edit the /etc/hosts file as shown below:
# Do not remove the following line, or various programs
# that requires network functionality will fail.
127.0.0.1               localhost.localdomain localhost
## Public Node names
 192.168.10.11          node1-pub.hingu.net     node1-pub
192.168.10.22          node2-pub.hingu.net     node2-pub
## Private Network (Interconnect)
 192.168.0.11            node1-prv               node1-prv
192.168.0.22            node2-prv               node2-prv
## Private Network (Network Area storage)
 192.168.1.11            node1-nas               node1-nas
192.168.1.22            node2-nas               node2-nas
192.168.1.33            nas-server              nas-server
## Virtual IPs
 192.168.10.111          node1-vip.hingu.net     node1-vip
192.168.10.222          node2-vip.hingu.net     node2-vip

What is difference between RAC ip addresses ?

Public IP adress is the normal IP address typically used by DBA and SA to manage storage, system and database. Public IP addresses are reserved for the Internet.
Private IP address is used only for internal clustering processing (Cache Fusion) (aka as interconnect). Private IP addresses are reserved for private networks.
VIP is used by database applications to enable fail over when one cluster node fails. The purpose for having VIP is so client connection can be failover to surviving nodes in case there is failure


Can application developer access the private ip ?
No. private IP address is used only for internal clustering processing (Cache Fusion) (aka as interconnect)

Thursday, April 14, 2016

Oracle Local Registry (OLR) in RAC

In 11gR2, addition to OCR, we have another component called OLR installed on each node in the cluster. It is a local registry for node specific purposes. The OLR is not shared by other nodes in the cluster. It is installed and configured when clusterware is installed.

Why OLR is used and why was it introduced.
In 10g, we cannot store OCR’s in ASM and hence to startup the clusterware, oracle uses OCR but what happens when OCR is stored is ASM in 11g.
OCR should be accessible to find out the resources that need to be started or not. But, if OCR is on ASM, it can’t read until ASM (which itself is the resource of the node and this information is stored in OCR) is up.

 To answer this, Oracle introduced a component called OLR.
Ø  It is the first file used to startup the clusterware when OCR is stored on ASM.
Ø  Information about the resources that needs to be started on a node is stored in an OS file called ORACLE LOCAL REGISTRY (OLR).
Ø  Since OLR is an OS file, it can be accessed by various processes on the node for read/write irrespective of the status of cluster (up/down).
Ø  When a node joins the cluster, OLR on that node is read, various resources, including ASM are started on the node.
Ø  Once ASM is up, OCR is accessible and is used henceforth to manage all the cluster nodes. If OLR is missing or corrupted, clusterware can’t be started on that node.

Where is OLR located
It is located $GRID_HOME/cdata/<hostname>.olr .  The location of OLR is stored in /etc/oracle/olr.loc and used by OHASD.
What does OLR contain
The OLR stores
·         Clusterware version info.
·         Clusterware configuration
·         Configuration of various resources which needs to be started on the node,etc.


To see the contents in the OLR file, we can use following commands and see the resources.

[root@rac1 ~]# ocrconfig -local -manualbackup
host01     2014/03/16 01:20:27     /u01/app/grid/11.2.0.3/product/grid_1/cdata/rac1/backup_20140316_012027.olr

[root@rac1 ~]# strings /u01/app/grid/11.2.0.3/product/grid_1/cdata/rac1/backup_20140316_012027.olr |grep -v type |grep ora!

ora!drivers!acfs
ora!crsd
ora!asm
ora!evmd
ora!ctssd
ora!cssd
ora!cssdmonitor
ora!diskmon
ora!gpnpd
ora!gipcd
ora!mdnsd


OLR administration
Checking the status of the OLR file on each node.
$ ocrcheck –local

OCRDUMP is used to dump the contents of the OLR to text terminal
$ocrdump –local –stdout

We can export and import  the OLR file using OCRCONFIG

$ocrconfig –local –export  <export file name>
$ocrconfig –local –import  <file_name>

We can even the repair the OLR file if it corrupted.

$ocrconfig –local –repair –olr <filename>

OLR is backed up at the end of the installation or an upgrade. After that time we need to manually backup the OLR. Automatic backups are not supported for OLR.

$ocrconfig –local –manualbackup.

Viewing the contents of backup file

$ocrdump  -local –backupfile <olr_backup_file_name>

To change the OLR backup location
$ocrconfig –local –backuploc  <new_backup_location>

To restore OLR
$crsctl stop crs
$ocrconfig –local –restore_file_name
$ocrcheck –local
$crsctl start crs
$cluvfy comp olr   -- to check the integrity of the OLR file which was restored.

Oracle Cluster Registry (OCR) in RAC

Oracle Cluster Registry (OCR) is the critical component in Oracle RAC.

Ø  OCR records cluster configuration information.  If it fails, the entire clustered environment of Oracle RAC is affected and a possible outage is a result.
Ø  It is the central repository for CRS, which stores its metadata, configuration and state information for all clusters defined in the clusterware.
Ø  It is the cluster registry maintains application resources and their availability within the RAC environment.
Ø  It also stores information of CRS daemons and cluster managed applications.


What is stored in OCR

We have the introduction of OCR, now we will see what are stored in OCR file.

Ø  Node membership information, i.e, which nodes are part of the cluster.
Ø  Software version
Ø  Location of 11g voting disk
Ø  Server pools
Ø  Status of cluster resources such as RAC databases, listeners, instances and services.

·         Server up/down
·         Network up/down
·         Database up/down
·         Instance up/down
·         Listener up/down
Ø  Configuration of the cluster resources like RAC databases, listeners, instances and services.
·         Dependencies,
·         Management Policies (automatic/manual)
·         Callout scripts
·         Retries
Ø  Cluster database instance to node mapping
Ø  ASM instance, disk groups, etc
Ø  CRS application resource profiles such as VIP address, service, etc.
Ø  Database services characteristics eg., preferred/available nodes, TAF policy, load balancing goal, etc
Ø  Information about clusterware processes
Ø  Information about interaction and management of 3rd party applications controlled by CRS
Ø  Communication settings where the clusterware daemons or background process listen.
Ø  Information about OCR backups.



Let us see the contents in OCR file.
[root@rac1 ~]# ocrconfig -manualbackup

rac2     2014/01/18 01:03:40     /u01/app/grid/product/11.2.0.3/grid_1/cdata/cluster01/backup_20140118_010340.ocr

[root@rac2 ~]#  strings /u01/app/grid/product/11.2.0.3/grid_1/cdata/cluster01/backup_20140118_010340.ocr| grep -v type |grep ora!

ora!LISTENER!lsnr
ora!host02!vip
rora!rac1!vip
;ora!oc4j
6ora!LISTENER_SCAN3!lsnr
ora!LISTENER_SCAN2!lsnr
ora!LISTENER_SCAN1!lsnr
ora!scan3!vip
ora!scan2!vip
ora!scan1!vip
ora!gns
ora!gns!vip
ora!registry!acfs
ora!DATA!dg
dora!asm
_ora!eons
ora!ons
ora!gsd
ora!net1!network


Who is updating OCR

It is updated and maintained by many client applications.

Ø  CSSd during startup of cluster – to update the status of the servers.
Ø  CSSd during node addition/deletion – to add/delete nodes
Ø  CRSd about status of nodes during failure/reconfiguration
Ø  OUI  - Oracle universal Installer during installation/upgradation/deletion/addition
Ø  Srvctl – ( to manage clusters and rac databases)
Ø  Cluster control utility – crsctl (to manage cluster /local resources)
Ø  Enterprise Manager (EM)
Ø  Database Configuration Assistant (DBCA)
Ø  Database upgrade Assistant (DBUA)
Ø  Network Configuration Assistant (NETCA)
Ø  ASM configuration assistant (ASMCA)

How the update is performed in OCR.

1.       Each node in the cluster will have the copy of OCR in memory for better performance and each node is responsible for updating the OCR if required.
2.       CRSd process is responsible for reading and writing to the OCR files as well as refreshing the local OCR cache and caches on the other nodes in the cluster.
3.       Oracle uses distributed shared cache architecture during cluster management to optimize queries against the cluster repository and at the same time, each node maintains a copy of the OCR in memory.
4.       Oracle clusterware uses the background process to access the OCR cache.
5.       Only one CRSd process (designated as master) in the cluster performs any read/write activity. If any new information read by the CRSd master process, then it refresh the local OCR cache and the OCR cache on the others nodes of the cluster.
6.       As the OCR cache is distributed across all nodes in the cluster, OCR clients like srvctl,crsctl,etc will communicate directly with the OCR process on the node to get required information.
7.       Clients communicate via the local CRSd process for any updates on the physical OCR binary file.

During the above process, OCRCONFIG command cannot modify the OCR configuration information for nodes that are shut down or for nodes on which oracle clusterware is not running.  So, we need to avoid shutting down the nodes while modifying the OCR using the ocrconfig command.  We need to perform a repair on the stopped node before it can bring online to join the cluster.

OCRCONFIG –repair command change the OCR configuration only on the node from which we are executing this command.
 Example :  if the OCR mirror was relocated to a disk in rac2 which is down from /dev/raw/raw2  in rac1, then use ocrconfig –repair ocrmirror /dev/raw/raw2 command on rac2 while the CRS stack is down on that node to repair its OCR configuration.


Purpose of OCR

Ø  Oracle clusterware reads the ocr.loc for the location of the registry and to know which application resources needs to be started and the nodes on which to start them.
Ø  Used for bootstrap the CSS with port info, nodes in the cluster and  other similar informations.
Ø  CRSd and other clusterware daemons function is to define and manage resources managed by clusterware. Resources have profiles that define metadata about them. This metadata is stored in OCR. The CRS reads the OCR and manages the application resources, starts, stops, and monitor and manages their failover.
Ø  Maintains and tracks the information pertaining to the definition, availability and current state of the services.
Ø  Implements the workload balancing and continuous availability features of services.
Ø  Generates event during state changes.
Ø  Maintain configuration profiles of resources in OCR.
Ø  Records the currently known state of the cluster at regular intervals and provides the same when reqeuested by client application like srvctl,crsctl,etc.




How the information is stored in OCR

Ø  OCR uses a file-based repository to store configuration information in a series of key-value pairs, using a directory tree-like structure.
Ø  It contains information pertaining to all tiers in the clustered database.
Ø  Various parameters are stored as name-value pairs used and maintained at different levels of the architecture.

Each tier is managed and administered by daemon process at different levels with appropriate privileges.
Eg. All system level resources or application definitions would require root, or superuser to start or stop.
Those defined at the database level will require dba privileges.



Where and how should OCR be stored ?

Ø  Location of the OCR is found in a file on each individual node of the cluster. It varies depending on the flavors of unix. In Linux , /etc/oracle/ocr.loc

Ø  OCR must reside on a shared disk that is accessible by all nodes in the cluster. It cannot be stored on raw filesystems as in 10g as it is deprecated.  So, we are left with cluster filesystem (CFS) and ASM. But CFS will cost high and may not be an option. So, better to go for ASM.  OCR and voting disks can be exclusively available on any disksgroups .

Ø  OCR is striped and mirrored (if we have redundancy in the external) similar to other database files.

Ø  OCR is replicated across all the underlying disks of the diskgroup, so failure of the disk does not bring the failure of the diskgroup.

Ø  In 11gR2, we can have upto 5 OCR copies.

Ø  Since it is in shared location, it can be administered from any nodes irrespective of the node on which the registry was created.

Ø  A small disk of around 300MB-500MB is a good choice.



Utilities to manage OCR

Add an OCR file
—————-
Add an OCR file to an ASM diskgroup called +DATA
ocrconfig –add  +DATA

Moving the OCR
————–
Move an existing OCR file to another location :
ocrconfig –replace /u01/app/oracle/ocr –replacement +DATA

Removing an OCR location
————————
- requires that at least one other OCR file must remain online.
ocrconfig –delete +DATA



OCR Backups
ü  Oracle Clusterware 11g Release 2 backs up the OCR automatically every four hours on a schedule that is dependent on when the node started (not clock time).
ü  OCR backups are made available in $GRID_HOME/cdata/<cluster name> directory on the node performing the backups.
ü  One node known as the master node is dedicated to these backups, but in case master node is down, some other node may become the master. Hence, backups could be spread across nodes due to outages.
These backups are named as follows:
-4-hour backups   (3 max) –backup00.ocr, backup01.ocr, and backup02.ocr.
-Daily backups     (2 max) – day.ocr and day_.ocr
-Weekly backups (2 max) – week.ocr and week_.ocr

ü  It is recommended that OCR backups may be placed on a shared location which can be configured using ocrconfig -backuploc <new location> command.
ü  Oracle Clusterware maintains the last three backups, overwriting the older backups. Thus, you will have 3 4-hour backups, the current one, one four hours old and one eight hours old.
ü  Therefore no additional clean-up tasks are required of the DBA.
ü  Oracle Clusterware will also take a backup at the end of the day. The last two of these backups are retained.
ü  Finally, at the end of each week Oracle will perform another backup, and again the last two of these backups are retained. You should make sure that your routine file system backups backup the OCR location.
ü  Note that RMAN does not backup the OCR.

You can use the ocrconfig command to view the current OCR backups:

#ocrconfig –showbackup auto