-------------- Bulk Updating PostgreSQL Table Using CSV and Temporary Table-----------------------------
Introduction
In this document, we walk through a
practical approach to bulk update records in a PostgreSQL table using a CSV
file. This method is particularly useful when dealing with large datasets
(millions of rows) where direct updates may be inefficient or error-prone.
The scenario involves updating columns in a target table using data provided in a CSV file.
Step 1: Create a Temporary Staging Table
We first create a temporary table to load
the CSV data. Temporary tables are session-specific and automatically dropped
when the session ends, making them safe for intermediate operations.
CREATE TEMP TABLE tmp_target_table (
id bigint,
column_a varchar(45),
column_b bigint
);
Step 2: Load CSV Data Using \copy Command
Since the CSV file resides on the client
machine, we use the \copy command in psql, which reads the file from the client
side.
\copy tmp_target_table
FROM 'C:/path/to/your/file.csv'
DELIMITER ','
CSV HEADER;
Step 3: Validate Loaded Data
Always verify the data count after loading.
SELECT count(*) FROM tmp_target_table;
Step 4: Verify Matching Records with Target Table
Before performing updates, ensure that all
IDs exist in the target table.
SELECT count(*)
FROM tmp_target_table s
JOIN schema_name.target_table t
ON t.id = s.id;
If needed, check for any missing records:
SELECT s.*
FROM tmp_target_table s
LEFT JOIN schema_name.target_table t
ON t.id = s.id
WHERE t.id IS NULL;
Step 5: Perform Bulk Update
Now update the target table using the
staging table.
UPDATE schema_name.target_table t
SET
column_a = s.column_a,
column_b = s.column_b
FROM tmp_target_table s
WHERE t.id = s.id;
Step 6: Post-Update Validation
Finally, confirm that updates were applied
correctly.
SELECT count(*)
FROM schema_name.target_table t
JOIN tmp_target_table s
ON t.id = s.id;
Key Benefits of This Approach
• Efficient for millions of records
• Minimal locking compared to row-by-row
updates
• Easy validation before applying changes
• Safe rollback if wrapped inside a
transaction
Best Practices
Always test in a lower environment first.
Wrap the update inside a transaction for safety:
BEGIN;
-- update statement
COMMIT;
Consider indexing the staging table if join performance is slow:
CREATE INDEX idx_tmp_target_table_id ON tmp_target_table(id);
Run VACUUM ANALYZE after large updates if needed.
Conclusion
Using a temporary table with the \copy
command is one of the most reliable and performant ways to perform bulk updates
in PostgreSQL. This method ensures data integrity while handling large volumes
efficiently.
No comments:
Post a Comment