The Rise of the Serverless Data Architectures

19 Feb 2024 (1 year ago)

Serverless Databases

Serverless databases have gained popularity due to advancements in distributed systems and the rise of serverless functions.
Key challenges in building serverless databases include elasticity, requiring automatic scaling and resource provisioning.
Different architectural decisions shape serverless database design, such as choosing between multi-tenancy and local storage.

ELTIS (Extract, Load, Transform, Integrate, Serve) is a data processing architecture that ensures independence and scalability by moving data between nodes.
Data modeling is crucial for ELTIS to function effectively.
Scaling in ELTIS involves adding or removing nodes, moving partitions, and adding query routers and metadata nodes.
Partition splitting and tearing are techniques used to keep partitions small for efficient movement.
Rebalancing algorithms continuously monitor load and adjust partition placement to maintain balance.

Compute storage separation is another cloud-like architecture where storage and compute are separate clusters.
Storage clusters are easy to scale out by adding nodes, while compute nodes scale up by increasing the size of individual nodes.
Compute storage separation enables features like copy-on-write and simplifies database management.

Serverless databases offer scalability but come with trade-offs such as potential slowdowns for global transactions, cold start issues, and minimum payment requirements.
Different database systems have different latency trade-offs, and performance requirements should be carefully considered to ensure they are realistic and cost-effective.
Testing is crucial to understand the actual latency and inconsistency of a database system.

Serverless databases are a good fit for small companies with stable workloads but become more advantageous for larger companies with multiple workloads, high variability, or global operations.
Serverless databases can provide cost savings and reduce the need for capacity planning, especially for highly variable workloads.

Serverless functions can be highly variable in workload, making them a risk to databases.
Serverless databases are designed to handle the variability of serverless functions and can save money on capacity planning.
When using serverless functions, it's important to consider the architecture and trade-offs involved.
A simple architecture involves having all functions connect directly to the database, but this may not be suitable for all situations.
A more robust architecture involves having a backend or proxy between the functions and the database, which can provide stability and caching.

Data locality is a complex issue that is not solved by using a serverless database.
There are similarities between shared-nothing and storage-compute separation architectures, but the choice between them depends on specific requirements.
Hybrid models that combine elements of both shared-nothing and storage-compute separation architectures can be beneficial in certain situations.

The speaker discusses the trade-off between control and performance in software development.
Loading things into a local machine provides more control but may require more manual effort.
Using a vendor to automatically handle these tasks can be more convenient but may result in less control and potential performance issues due to the vendor's caching policies.