Cloud database architecture has focused on scalability and workload-specific features to take advantage of the elasticity of the cloud. Programmers and users have come to depend on database APIs to take advantage of these scalability and workload features. Consequently, it’s common for database APIs to provide direct visibility to database state. However, newer cloud database architecture API systems now provide bulk and change data, timestamps, and transaction identifiers as well.
Over the last five years, artificial intelligence and machine learning (AI/ML) have advanced at a rapid pace. In many use cases, machine learning is now superior to human intelligence in facial and voice recognition. As a result, machine learning models now available offer sophisticated insights for accurate and agile decision making.
Could machine learning lead to a new generation of cloud database architecture?
Models are only as good as the data fed to them. The availability of high quality data limits the possible decisions made by modern machine learning. Unfortunately, the current data available comes from multiple disparate and siloed systems. Correlating data from these siloed sources is a prerequisite to running ML models successfully. Further, this correlation requires a corresponding evolution in cloud database architecture that focuses on data integration methodologies. Luckily, the new APIs from many systems provide the underlying foundation for the new data integration now required.
Data enrichment needed
Multi-dimensional data organization has proven to be a superior method of data organization for OLAP systems. Append-only stores have proven to be more scalable and support higher rates of write performance. Reverse indexes with LSM trees have proven to be the right architecture for these searches. A modern cloud database architecture includes all these attributes. What is missing?
Data enrichment is a very common outcome of running machine learning models. Enrichment adds useful tags and other information to existing data so the data can be more effectively used. Like multidimensional data reordering for analytics, machine learning also requires data reordering for enrichment, aggregation, and anomaly detection.
Cloud database architecture requires a new kind of reordering. In its simplest form, this reordering will need a transactionally-consistent snapshot of all related data compatible with machine learning. Without it, the data will be uncorrelated in both the time and relevance dimensions.
Time dimension correlation
First, consider time dimension correlation of a cloud database architecture. Most databases that support cursors also support cursor stability. Any kind of paginated read is based on a versioned and consistent state of the query when the cursor was opened. This provides both stability and data isolation.
Extending this idea, applications need to present a stable ‘cursor’ to read their API up to a given transaction identifier. This concept is very similar to a materialized view which shows transactionally-consistent data. Griddable calls this feature snapshot materialization, a key enhancement to cloud database architectures. A Griddable snapshot provides the state of all API objects up to a given transaction identifier or timestamp. Further, Griddable constantly updates the snapshot with each additional transaction.
Relevance dimension correlation
With a multi-dimensional data architecture, applying a machine learning data model to one customer and their interactions means every customer will need to become a dimension. While this is possible, it is certainly not scalable. Instead, collating all the data pertinent to one customer in one object folder will result in very high fidelity insights.
Griddable facilitates movement and reorganization of data for this analysis through its intelligent policy engine. Data renaming, reorganization, and selective filtering are all accomplished as data moves across the grid. The grid isolates data reorganization to the target and this protects the investment in existing applications accessing source data.
Griddable provides a key enhancement to cloud database architectures to enable both insights and decisions. To see Griddable in action, click the “Demo Now” button for a 10 minute, no-obligation tour.