Change Data Capture (CDC) – pattern or antipattern?

Montserrat, Catalonia, picture by the author

Intro. What is Change Data Capture and Why?

No enterprise system is a lone island. 

Integration of various systems is the key element to building new digital experiences and maintaining IT capabilities for enterprise application ecosystems.

The preferred way of integrating applications, local and cloud, owned and foreign, is through calling APIs exposed by them often in the form of a dialogue managed by an orchestration engine or microservices choreography. 

Sometimes it’s not possible to do any of that. For instance the legacy system does not expose the APIs we need, and it would be hard to do it now as the documentation is outdated, the last developers are long gone, and it would be considered too risky to touch. 

→ Explore more Legacy migration by a thousand features

This technique is also used when it is the plan to convert the system from the monolithic to modular or microservices architecture; Change Data Capture (CDC) acts as a stop gap solution to detect changes in other ‘services’.

→Read more about Evolutionary architectures in practice

 

The promise of “happily ever after”

Field of poppies, Poland, picture by the author.

The idea of Change Data Capture (CDC) is to observe what is happening in the data layer of the applications, i.e., which data is changing as a result of business operations, and to capture that set of changes in a message.

The message is then sent to another system and that action is performed based on the contents of the message. Replicating the changes in another database is the most common usage scenario.

Change Data Capture is supported in many modern database systems by monitoring database log changes and minimizing the performance impact of the technique.

Looks so good, let’s just use it and be happy.

→ Explore more  Data Services by Avenga

 

Reality of CDC: Not-So-Bright Light

Showers, Italy, picture by the author.

After using CDC, many developers and architects started to notice its drawbacks.

 → Explore more  Full cycle development and developers

 

CDC per data source

Each type of data source (database, logfile, or another kind of database) requires a different approach to CDC. There are different mechanisms for each source and they cannot be unified (easily) for multiple sources with different characteristics.

And, it works best when the target data source is the same as the source, for instance two MS SQL / Oracle databases (same engine versions).

 

Data meaning

Relational database sources are often denormalized and modified to allow a fast read or write operations based on concrete usage scenarios. 

Capturing this kind of data may result in a hard to understand set of data, which is far away from the concept of a Data Transfer Object (and similar), is very technical, and has a low level format almost unsuitable for anything else other than another identical database.

→ Explore Essentially, Data is good. It’s the use cases that can be problematic

 

Noise

Many changes in the database tables may not be related to the actual business logic triggering them, but the maintenance operations or reporting activities. This creates noise. 

 

Schema changes

The database schemas do change over time and CDC just picks out whatever was changed and your receiving part may receive a nasty surprise without an opportunity to control it.

 

Encapsulation puncture

The business system is much more than just data storage, the entire logic of it is out of the data store. From the architectural perspective, CDC punctures the encapsulation by accessing data layers directly and creating ad hoc contracts between services.

 

Future of Change Data Capture. A pragmatic view.

Corfu island, Greece, picture by the author.

 

Domain events

CDC is often a very simple implementation of tracking the changes without writing the code for creating and sending the events. What a simplification! It looks like the changed events are so close to the business domain events that everybody should be happy.

In complex business applications it is often a trap. The true events can contain contextual information, information relevant from the business logic perspective, and are generated exactly when they should be; for instance, after successfully committing to all the permanent data storages and calling an external API. 

In other words there’s so much going on outside the data store that the assumption that events can be replaced by a simple tracking of changes in the database tables is almost always misleading and dangerous.

The developers of business logic should be in control.  When something interesting happens (new customer, changed to account configuration), an event is generated with the right data and sent to the right channel. 

 

Event sourcing

The future lies certainly with the event-driven systems and even driven philosophy, when multiple autonomous systems and their internal microservices communicate with each other and the external world, using asynchronous messages generated by developers with the proper structure and at the right moment.

Domain event sourcing is the modern way to address the problem of detecting the changes in the business domain and enabling other actors to react accordingly.

It’s not a silver bullet either, but that’s probably a topic for another article.

Even though CDC (Change Data Capture) has started to be considered by many as an antipattern, here, at Avenga, we understand that it may still be a choice when other methods cannot be used. While others frantically avoid it at all costs, we the pragmatists, can certainly do it properly when we have to.

Back to overview