skip to Main Content

All cloud native databases need a strongly consistent data transport

A review of cloud native databases requires first a definition of the term ‘cloud native’. Per the Cloud Native Computing Foundation (CNCF), cloud native technologies have these characteristics:

Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

By this definition, cloud native databases need to leverage cloud native technologies and exhibit several key characteristics.

The tradeoffs

Throughout their development, relational databases focused squarely on transactional performance.  They carefully tuned the code paths providing traditional ACID properties and added a few non-standard SQL features.  As a result, they were able to achieve the highest performance of centralized data stores.

In 1999, Professor Eric Brewer of the University of California at Berkeley formalized and published the ground-breaking CAP Theorem for distributed data stores. The CAP Theorem states that transactional consistency, data availability and partition tolerance cannot all be satisfied in distributed data stores.  

The CAP Theorem immediately became the blueprint for cloud native databases. In response, numerous NoSQL databases emerged which compromised transactional consistency. Google started this trend in cloud native databases by introducing BigTable in 2004.

Sacrificing consistency

Several cloud native databases emerged offering eventually consistent architectures, such DynamoDB, CosmosDB, and Cassandra. Other database providers leveraged access patterns such as graph walks (Neo4j, GraphQL) or textual searches (Elastic, Solr, Lucene) and versions of cloud native databases emerged such as the ElasticSearch service from AWS. Now, nearly 20 years after the formalization of the CAP Theorem, there are nearly 100 cloud native databases popular for specific workloads.

Map-reduce architectures posed an additional problem: sharing data across hundreds of compute nodes.  Map-reduce relaxed consistency again by keeping files as immutable blocks. A new type of map-reduce database using append-only files in a clever data structure called Log-Structured Merge tree (LSM tree) became commonly used. Several cloud native databases emerged for analytic workloads based on this pattern: Hive, HBase, Impala and BigQuery (based on Dremel).

Some of the traditional relational database replication technologies followed the trend by compromising consistency in favor of other goals. For example, MySQL clustering provides eventual consistency and some of the largest MySQL clusters drive sites like Facebook and Uber.

With all these consistency tradeoffs by cloud native databases, why not also compromise transport consistency between cloud native databases? For example, Kafka compromised consistency and is quite successful in moving data between cloud native databases. However, the key factor when choosing the appropriate transport is the type of the source database. When the source database honors transactions, data out of order on the target can produce misleading results.  This is true even for reporting or analytics cloud native databases.

Future-proofing data

Based on historical trends, workload-oriented databases will continue their progression. This comes at the price of synchronization and integration. 

The challenge for modern IT: how to future-proof data when databases keep changing?  Griddable separates the data from the database and governs the consumption of data with user policies. Such Griddable policies select the exact data to synchronize to each database through a JSON-like policy language. Users write their own policies using a graphical Griddable schema browser, or directly in JSON.

The core Griddable framework supports full timeline-based transactional consistency. Therefore, Griddable is compatible with source and target databases which are either strongly consistent or not. To preserve consistency, Griddable preserves the order of transactions throughout its data pipeline.  This pipeline starts with the relay which extracts changes. The pipeline continues through the Change History Server which persists a change record. Finally, the pipeline ends with the consumer which records changes in the target database. As a result of this modular shared-nothing data pipeline, Griddable also supports high availability and network partitioning.

Is the strong consistency guarantee from Griddable saying that the CAP Theorem is untrue? Certainly not. Griddable preserves the consistency provided by the source database, whether it be a cloud native databases or traditional database.

Next step

To see Griddable for yourself, click the “Demo Now” button for a 10 minute, no-obligation tour.

Back To Top