Skip to content

Troubleshooting

Overview

Not all changes to the source schema can be detected automatically, in which case data corruption may occur. The following schema changes may cause data corruption or failure to process the events downstream: Dropping columns Adding columns to the middle of a table Changing the data type of a column Reordering columns Dropping tables (relevant if the same table is then recreated with new data added) Truncating tables

Detail

The destination table not transfered after schema has changed

Context:

Change schema -> Paused streaming, deleted in GCS and BigQuery, initialed backfill -> The targeted table is not exist

Reason:

Guessing [Not confirmed] There are not exist records in the GCS for the Dataflow to handle, so we will fake the the process by TRUNCATE (not affeced) then make a snapshot INSERT INTO (affected)

Handle:

[0] Prepairing:

  • Datastream is PAUSED state

In the source database:

[1] Create a duplicated table in a temp schema or staging zone

CREATE TABLE <temp_schema>.<table> AS AS SELECT * FROM <schema>.<table>;

[2] Truncate the current table, this is not affected the streaming process (in the limitation)

TRUNCATE <schema>.<table>;
INSERT INTO <schema>.<table> SELECT * FROM <temp_schema>.<table>;

Then check the process and table in the destination database

SELECT something FROM <destination_schema>.<table_mapping> LIMIT 1

If exists with the same schema, it's worked

Turn on the Datastream with RUNNING state

Clean resource after

DROP TABLE <temp_schema>.<table>;

Note:

  • Make sure the table between source and destination has the same schema

  • Please check the existed at least 1 record (if exists) to confimed

  • The dataset is not big (< 10M rows)

Following documentation:

https://www.devart.com/dbforge/mysql/studio/mysql-copy-table.html

Check the streaming is RUNNING

Context:

Check a streaming is in running state

Handle:

You can use 2 method (a) From CLI and (b) See the state in the UI

[1] Using gcloud CLI

declare STREAMING_ID=dawnbreaker2magnus
declare PROJECT_LOCATION=<project-region>
gcloud datastream streams describe $STREAMING_ID --location=$PROJECT_LOCATION --format=json | yq  '.state'

[2] Using in the UI

Go to the URL with the following construct

declare STREAMING_URI="https://console.cloud.google.com/datastream/streams/locations/$PROJECT_LOCATION/instances/$STREAMING_ID;tab=overview?project=$PROJECT_LOCATION"

ui-streaming-is-in-running-state

Following documentation:

Gcloud SDK Datastream describe

Failed to start stream by not have permissions

Datastream doesnt have MySQLs SLAVE REPLICATION permission on the source database. Datastream failed to check the binary log files.

Failed writing to the destination bucket “streaming-magnus” because Datastream`s service account is missing the necessary permissions. See Destination setup for more details.

You can use the stream.update() API and add &force=true to the request to skip validation when starting the stream.

Source Reference