DbaKeeda: How to use PARALLEL parameter in Datapump

Many a times, we may observe that datapump is running slow even after using PARALLEL option. But, we don’t know that there is a method to calculate value for PARALLEL parameter. Below information will helps us to do that…

DATAPUMP will use two methods to export/import data

a. DIRECT PATH

2. EXTERNAL TABLES

It is upto datapump to decide which path it can work. Means, it may happen that some tables are exported/imported through direct path and some other using external tables in the same datapump job.

1. Value for PARALLEL parameter in datapump can be set to more than one only in Enterprise Edition.

2. PARALLEL will not work effectively on Jobs with more metadata

3. As we know, in datapump, a master process will control entire process of export/import through worker processes. These worker process will start parallel execution (PX) processes to do that actual work. We can increase or decrease parallelism at any moment using interactive prompt.

4. If the worker processes are active, then even though we decrease the value for parallel, it will not be affective till the work reaches a completion point. But, increase in parallel processes will take affect immediately.

5. For Data Pump Export, the value that is specified for the parallel parameter should be less than or equal to the number of files in the dump file set (if you specified dump files exclusively and not used %U option)

HOW EXPORT WITH PARALLEL WORKS

1. Master process will start multiple worker processes. Atleast 2 worker processes will be created in case of export, one is for metadata and other is for table data.

2. You might have observed that datapump export will give estimated data size (even if we don’t mention ESTIMATE option). This is to calculate the number of parallel execution (PX) processes.

3. The columns of the tables are observed next, to decide whether to go for direct path or external tables method

Note: Direct path doesn’t support PARALLEL more than one

4. If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data. It does this by dividing the estimated size of the table data by 250 MB and rounding the result down. If the result is zero or one, then PX processes are not used

Example : If the data size is 1000MB, it will be divided by 250MB which results in 4 i.e one process is for metadata and 1 process for data. again the worker process of data will have 4 corresponding PX processes. So, the total number of processes are 6.

Note : PX proceeses information will not be shown when we check through STATUS command

So, in the above example, we will see only 2 processes instead of 6 if we use STATUS command

5. Even though we give more PARALLEL value, Oracle will calculate worker and PX processes as above only.

Example : if we give PARALLEL=10 in above example, still Oracle uses 6 only

Note : We need to remember that Oracle will not scale up the processes if we mention less value to PARALLEL. So, we need to give more value any time in order to get maximum benefit

HOW IMPORT WITH PARALLEL WORKS

The PARALLEL parameter works a bit differently in Import because there are various dependencies and everything must be done in order.

Data Pump Import processes the database objects in the following order:

1. The first worker begins to load all the metadata: the tablespaces, schemas, etc., until all the tables are created.

2. Once the tables are created, the first worker starts loading data and the rest of the workers start loading data

3. Once the table data is loaded, the first worker returns to loading metadata again i.e for Indexes or other objects. The rest of the workers are idle until the first worker loads all the metadata

Note: One worker creates all the indexes but uses PX processes up to the PARALLEL value so indexes get created faster.

Thus, an import job can be started with a PARALLEL = 10, and the user will only see one worker being utilized at certain points during job execution. No other workers or Parallel Execution Processes will be working until all the tables are created. When the tables are created, a burst of workers and possibly PX processes will execute in parallel until the data is loaded, then the worker processes will become idle.

Before starting any export/import, it is better to use ESTIMATE_ONLY parameter. Divide the output by 250MB and based on the result decide on PARALLEL value

Finally when using PARALLEL option, do keep below points in mind

a. Set the degree of parallelism to two times the number of CPUs, then tune from there.

b. For Data Pump Export, the PARALLEL parameter value should be less than or equal to the number of dump files.

c. For Data Pump Import, the PARALLEL parameter value should not be much larger than the number of files in the dump file set.

For more details, you can refer to MOS doc 365459.1

Hope this post will help you & me when we perform Parallel datapump operations next time…

DbaKeeda

Translate into your own language

Sunday, March 20, 2016

How to use PARALLEL parameter in Datapump

No comments:

Post a Comment