Deepthi Sigireddi on Distributed Database Architecture in the Cloud Native Era

03 Oct 2024 (6 months ago)

Deepthi Sigireddi and Vitess

Deepthi Sigireddi is the technical lead for Vitess, a cloud-native, open-source, distributed database project, and also the Vitess engineering lead at PlanetScale, a database-as-a-service company built on Vitess (1m11s).
Deepthi's career started as an application developer working with databases, including Oracle, DB2, and MySQL, and later transitioned to working on supply chain planning solutions for the retail industry, which required handling massive amounts of data (1m36s).
To handle large amounts of data, Deepthi and her team developed parallelizable computing solutions to work with monolithic databases, and later worked on cloud security at a startup that was acquired by IBM, where they implemented custom sharding for a multi-tenant system (1m57s).
Deepthi joined PlanetScale and started working on Vitess, a massively scalable distributed database system built around MySQL, which solves the limitations of monolithic MySQL and provides a distributed data management capability (3m14s).

Vitess Origins and Capabilities

Vitess was created at YouTube in 2010 to handle the large amount of traffic and video metadata stored in MySQL, which was causing the site to go down daily (4m6s).
Vitess brings a distributed data management capability to MySQL, allowing it to scale beyond the limits of a single server and providing a high level of performance, availability, scalability, and resilience (3m52s).
YouTube's MySQL database was unable to handle the increasing load, leading to the development of Vitess, a distributed database architecture, to solve this problem in a fundamental way (4m30s).
Vitess allows for vertical sharding, where a set of tables is split across multiple MySQL servers, and horizontal sharding, where a single table is distributed across multiple servers, making it transparent to the application (5m22s).
In the Vitess system, every MySQL instance has a sidecar process called VTTablet, which manages the MySQL instance, and applications interact with the system through a gateway called VTGate, which accepts requests and routes queries to the backing VTTablets (6m1s).
The VTGate gateway can accept MySQL protocol or gRPC calls, decide how to route queries, and aggregate results if necessary before sending them back to the client application (6m18s).

Choosing a Cloud-Native Database

When looking for a cloud-native database, developers should consider usability, compatibility with their application needs, and uptime, as these factors are crucial for a cloud database (7m12s).
Almost every commercially available database has a cloud offering, and developers should evaluate these options based on their specific needs, considering factors such as configuration, tuning, and support for their application (7m30s).
Compatibility is essential, as there are many cloud databases available, including Amazon RDS, Google Cloud SQL, and Oracle Cloud database offerings, and developers should choose a database that supports their application needs (8m11s).
Uptime is also critical, as the database is often critical to operations, and developers should evaluate the historical uptime of the database before making a decision (8m36s).
Cloud providers handle baseline requirements such as data safety, database availability, and data loss prevention, allowing users to focus on other aspects of their applications (8m51s).
Not every application requires a massively parallel distributed database, and users should consider their specific needs before choosing a database solution (9m21s).
In general, the only reason to not use a cloud database is due to legal restrictions on storing highly sensitive data with a third-party provider (9m52s).
Cloud databases are a good choice for most applications, but not all applications require a distributed database, and a standalone database in the cloud can be sufficient (10m27s).

Vitess Sharding

When choosing a database, users should consider factors such as usability, compatibility with existing applications and frameworks, and reliability (10m44s).
Distributed databases like Vitess use sharding to provide scalability and replication to provide high availability (11m12s).
In Vitess, sharding is customizable, allowing users to choose the column and function used for sharding, and also provides a public interface for building custom sharding functions (11m36s).
Vitess's sharding process involves computing which shard to write data to when inserting a row and computing which shard to read data from when querying a row (12m0s).
For queries that cannot be computed to a specific shard, Vitess uses a scatter query to send the query to all shards, gather the results, and bring them back (12m44s).
As a database grows, it eventually reaches a point where everything starts slowing down, and managing things becomes difficult, prompting the need to shard the database (13m9s).
Horizontal sharding is a method of sharding where data is split across multiple servers, and tools can be used to define the sharding scheme, choose the tables to shard, and copy data onto new shards (13m22s).
The sharding process can be done in the background while the system is still running, and when ready, the database can be switched to the new sharded configuration, with the original database being stopped for a short period, typically less than 30 seconds (14m2s).

Replication and High Availability in Vitess

An optional reverse replication can be set up to keep the old databases in sync with the new shards, allowing for rollbacks in case something goes wrong (14m32s).
Replication is a crucial aspect of modern software applications, as downtime is not acceptable, and services need to be available 24/7 (15m14s).
Traditionally, high availability is achieved through replication, where a primary database is written to, and replicas follow and apply changes to their local databases, providing an always-available copy of the data (16m17s).
Vitess uses replication to provide high availability and has tooling to guarantee very high availability, allowing for planned maintenance without downtime by transitioning leadership from the primary to a replica (16m44s).
This process enables maintenance to be performed on the primary database without affecting the availability of the service (17m6s).
Vitess uses replication for higher availability, and around that replication, features have been built to handle planned maintenance and unplanned failures, such as a primary MySQL instance going down due to memory issues, disk errors, or network problems (17m56s).
In Vitess, a monitoring component called VTOR (Vitess Orchestrator) monitors the cluster and elects a new primary instance if the current one is unreachable, ensuring that data is not lost (18m10s).
VTOR also monitors replication on replica instances and ensures they are replicating correctly from the cluster primary, as well as monitoring for other error conditions and fixing them (18m33s).
When a node in the cluster is not available or needs to be down for maintenance, a replica node will take over leadership, but the data needs to be properly sharded, partitioned, and replicated (18m59s).
The process of switching between nodes for planned maintenance is not fully automated by Vitess, but users can automate it using the VTOR and command-line interface, typically during a rolling upgrade of the cluster (19m39s).
For unplanned failures, VTOR handles the monitoring and failover without human intervention, and the timing for this process can be as quick as 5-10 seconds (20m20s).

Distributed Transactions in Vitess

During planned failures, applications may see a slight delay in response time, but not errors, due to request buffering, while unplanned failures may result in errors, but typically within a 30-second timeframe (20m38s).
Distributed databases come with distributed transactions, which involve balancing data consistency and data availability, as stated by the CAP theorem, which says that if you want one, you can't have the other (21m52s).
In a distributed system, consistency issues arise when reading data from replicas, as they may be caught up to different extents, resulting in an inconsistent view of the data (22m57s).
Historically, Vitess has dealt with consistency issues by reading from the primary instance if consistency is important, and from replicas if it's not, but people have used tricks like always reading from the primary in user sessions that involve writing (23m24s).
A desired feature in Vitess is to provide a way to specify read-after-write consistency, which would guarantee up-to-date data whether reading from the primary or a replica (24m13s).
In a distributed system, there is always a possibility of a distributed transaction, but most transactions go to one shard, in which case MySQL itself provides transactional guarantees (24m49s).
Distributed transaction problems arise when updating rows in different shards, such as in a bank transaction where both the sender and recipient's balances need to be updated (25m2s).
To ensure data consistency, distributed transactions are necessary, but true distributed transaction support can be challenging to implement, so creative solutions like writing to a ledger and reconciling after the fact can be used to solve this problem (25m20s).
Vitess uses a best-effort distributed transaction approach, where writes are executed one at a time, and if a failure occurs, all previous writes are rolled back, reducing the universe of possible failures (25m55s).
Once all writes are successful, the commits are issued in parallel, and the probability of a commit being rejected is low, making this approach effective for most use cases (26m37s).
Truly atomic distributed transactions are on the roadmap for Vitess, but the current best-effort approach has proven to be reliable (27m2s).

Schema Management in Vitess

Schema management and data contracts are gaining attention, and Vitess has a tool called VT Admin, which provides a structured gRPC API and a UI for managing the cluster (27m21s).
VT Admin allows users to view schemas, but schema management is not yet available through the UI, although the APIs are available for building custom tools (28m58s).
Best practices for managing schemas and schema versions include using tools and frameworks that provide versioning and change management capabilities, and Vitess plans to improve its schema management capabilities in the future (29m20s).
Storing schema in a version control system is recommended, as schema changes are a crucial topic in the MySQL community due to historical performance issues with large tables under load (29m38s).
MySQL has improved its schema change capabilities, making certain changes instant to prevent locking and blocking, but the community had to develop its own tools due to the slow pace of progress (30m20s).
The principles for online schema changes include being non-blocking, happening in the background, and allowing for reversion without data loss (30m49s).
Vitess has an online schema change system that follows these principles, allowing for non-blocking changes that can be initiated and cut over manually or automatically (30m56s).
Vitess's online schema changes are designed to be non-blocking, allowing the system to continue running with minimal additional load (31m17s).

Vitess Roadmap and Future Improvements

New features on the Vitess roadmap include improved MySQL compatibility, with recent additions such as Common Table Expressions and window functions (32m16s).
Performance improvements are also ongoing, including a new connection pooling implementation that reduces latency and memory utilization (33m8s).
The new connection pooling implementation makes the connection pool more efficient, reducing the need to cycle through the entire pool and minimizing the number of open connections required (33m19s).
Benchmark query performance is run every night and the results are published on a dedicated website called benchmark.us.doio, with an ongoing effort to continuously improve the benchmarks (33m42s).
Recent functionality improvements include the second iteration of point-in-time recovery in Vitess, with plans to make further improvements based on user feedback (34m4s).

Learning More about Vitess

To learn more about distributed databases in general or Vitess in particular, online resources include the Vitess website, which has documentation, examples, quick start guides, and links to videos (34m31s).
Vitess is part of the Cloud Native Computing Foundation, and was donated by Google in 2018, graduating from the foundation in 2019 (34m35s).
The Vitess website provides resources for running Vitess on a laptop, within Kubernetes, and has links to videos from maintainers, community members, and users (34m49s).
Searching for "Vitess" on YouTube yields plenty of talks and introductions to Vitess features, architecture, and diagrams (35m26s).
Working on open-source projects like Vitess has been a positive experience, allowing for new interactions, experiences, and personal growth as a software developer (35m54s).
Listeners can learn more about data engineering topics by checking out the AIML and data engineering community web pages on infoq.com, and listening to recent podcasts (36m41s).