Sql Server To Postgresql Query Converter

0 views

Skip to first unread message

Tarja Hempton

unread,

Aug 4, 2024, 6:31:07 PM8/4/24

to immiccelu

TheDebezium PostgreSQL connector captures row-level changes in the schemas of a PostgreSQL database.For information about the PostgreSQL versions that are compatible with the connector, see the Debezium release overview.

The first time it connects to a PostgreSQL server or cluster, the connector takes a consistent snapshot of all schemas. After that snapshot is complete, the connector continuously captures row-level changes that insert, update, and delete database content and that were committed to a PostgreSQL database. The connector generates data change event records and streams them to Kafka topics. For each table, the default behavior is that the connector streams all generated events to a separate Kafka topic for that table. Applications and services consume data change event records from that topic.

A logical decoding output plug-in. You might need to install the output plug-in that you choose to use. You must configure a replication slot that uses your chosen output plug-in before running the PostgreSQL server. The plug-in can be one of the following:

pgoutput is the standard logical decoding output plug-in in PostgreSQL 10+. It is maintained by the PostgreSQL community, and used by PostgreSQL itself for logical replication. This plug-in is always present so no additional libraries need to be installed. The Debezium connector interprets the raw replication event stream directly into change events.

The connector produces a change event for every row-level insert, update, and delete operation that was captured and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics.

PostgreSQL normally purges write-ahead log (WAL) segments after some period of time. This means that the connector does not have the complete history of all changes that have been made to the database. Therefore, when the PostgreSQL connector first connects to a particular PostgreSQL database, it starts by performing a consistent snapshot of each of the database schemas. After the connector completes the snapshot, it continues streaming changes from the exact point at which the snapshot was made. This way, the connector starts with a consistent view of all of the data, and does not omit any changes that were made while the snapshot was being taken.

The connector is tolerant of failures. As the connector reads changes and produces events, it records the WAL position for each event. If the connector stops for any reason (including communication failures, network problems, or crashes), upon restart the connector continues reading the WAL where it last left off. This includes snapshots. If the connector stops during a snapshot, the connector begins a new snapshot when it restarts.

Logical decoding replication slots are supported on only primary servers. When there is a cluster of PostgreSQL servers, the connector can run on only the active primary server. It cannot run on hot or warm standby replicas. If the primary server fails or is demoted, the connector stops. After the primary server has recovered, you can restart the connector. If a different PostgreSQL server has been promoted to primary, adjust the connector configuration before restarting the connector.

Debezium currently supports databases with UTF-8 character encoding only.With a single byte character encoding, it is not possible to correctly process strings that contain extended ASCII code characters.

To optimally configure and run a Debezium PostgreSQL connector, it is helpful to understand how the connector performs snapshots, streams change events, determines Kafka topic names, and uses metadata.

To use the Debezium connector to stream changes from a PostgreSQL database, the connector must operate with specific privileges in the database.Although one way to grant the necessary privileges is to provide the user with superuser privileges, doing so potentially exposes your PostgreSQL data to unauthorized access.Rather than granting excessive privileges to the Debezium user, it is best to create a dedicated Debezium replication user to which you grant specific privileges.

For more information about configuring privileges for the Debezium PostgreSQL user, see Setting up permissions.For more information about PostgreSQL logical replication security, see the PostgreSQL documentation.

Most PostgreSQL servers are configured to not retain the complete history of the database in the WAL segments.This means that the PostgreSQL connector would be unable to see the entire history of the database by reading only the WAL.Consequently, the first time that the connector starts, it performs an initial consistent snapshot of the database.

The default behavior for performing a snapshot consists of the following steps.You can change this behavior by setting the snapshot.mode connector configuration property to a value other than initial.

Start a transaction with a SERIALIZABLE, READ ONLY, DEFERRABLE isolation level to ensure that subsequent reads in this transaction are against a single consistent version of the data. Any changes to the data due to subsequent INSERT, UPDATE, and DELETE operations by other clients are not visible to this transaction.

If the connector fails, is rebalanced, or stops after Step 1 begins but before Step 5 completes, upon restart the connector begins a new snapshot. After the connector completes its initial snapshot, the PostgreSQL connector continues streaming from the position that it read in Step 2. This ensures that the connector does not miss any updates. If the connector stops again for any reason, upon restart, the connector continues streaming changes from where it previously left off.

The connector always performs a snapshot when it starts. After the snapshot completes, the connector continues streaming changes from step 3 in the above sequence. This mode is useful in these situations:

After a cluster failure, a new primary has been promoted. The always snapshot mode ensures that the connector does not miss any changes that were made after the new primary had been promoted but before the connector was restarted on the new primary.

The connector performs a database snapshot when no Kafka offsets topic exists. After the database snapshot completes the Kafka offsets topic is written. If there is a previously stored LSN in the Kafka offsets topic, the connector continues streaming changes from that position.

The connector performs a database snapshot and stops before streaming any change event records. If the connector had started but did not complete a snapshot before stopping, the connector restarts the snapshot process and stops when the snapshot completes.

If there is a previously stored LSN in the Kafka offsets topic, the connector continues streaming changes from that position.If no LSN is stored, the connector starts streaming changes from the point at which the PostgreSQL logical replication slot was created on the server.Use this snapshot mode only when you know that all data of interest is still reflected in the WAL.

The custom snapshot mode lets you inject your own implementation of the io.debezium.spi.snapshot.Snapshotter interface.Set the snapshot.mode.custom.name configuration property to the name provided by the name() method of your implementation.The name is specified on the classpath of your Kafka Connect cluster.If you use the EmbeddedEngine, the name is included in the connector JAR file.For more information, see custom snapshotter SPI.

By default, a connector runs an initial snapshot operation only after it starts for the first time.Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process.Any future change event data that the connector captures comes in through the streaming process only.

However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete.To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots.You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:

You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot.Ad hoc snapshots require the use of signaling tables.You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.

When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table.If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.

Ad hoc snapshot signals specify the tables to include in the snapshot.The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database.Also, the snapshot can capture a subset of the contents of the table(s) in the database.

You specify the tables to capture by sending an execute-snapshot message to the signaling table.Set the type of the execute-snapshot signal to incremental or blocking, and provide the names of the tables to include in the snapshot, as described in the following table:

An array that contains regular expressions matching the fully-qualified names of the table to be snapshotted.

The format of the names is the same as for the signal.data.collection configuration option.

An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.

Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures.You can set the following parameters for each additional condition:

Specifies column values that must be present in a database record for the snapshot to include it, for example, "color='blue'".

The values that you assign to the filter parameter are the same types of values that you might specify in the WHERE clause of SELECT statements when you set the snapshot.select.statement.overrides property for a blocking snapshot.In earlier Debezium releases, an explicit filter parameter was not defined for snapshot signals; instead, filter criteria were implied by the values that were specified for the now deprecated additional-condition parameter.