Optimizing Java Applications on Kubernetes: beyond the Basics

06 Jan 2025 (3 months ago)

Introduction and Overview of Java at Microsoft

The speaker works for Microsoft, specifically in the Java Engineering Team, and has previously worked for Oracle, and is a Java developer who uses a Mac. (38s)
Microsoft has its own JDK and helps internal teams optimize Java workloads, with examples including Azure, Minecraft, and LinkedIn. (57s)
Java is widely used within Microsoft, including in Azure's control plane, Minecraft servers, and LinkedIn's production environment, with hundreds of thousands of JVM instances running. (1m22s)
The talk will cover four main topics: optimizing Java workloads on Kubernetes, container size and startup time, JVM defaults and ergonomics, and Kubernetes' impact on Java workloads, as well as a concept called AB performance testing in production. (3m5s)
The talk is not an advanced JVM tuning class, and attendees are encouraged to seek out other resources for learning about JVM tuning, garbage collection logs, and other related topics. (3m50s)
The talk aims to provide pointers and opportunities for attendees to learn more about optimizing Java workloads on Kubernetes. (4m28s)

Optimizing Container Image Size

Reducing the size of container images is a topic of interest, with the goal of improving startup time and overall performance. (4m41s)
Reducing the size of a container image is important, but it's not the most critical factor, especially when running infrastructure on a data center with high-speed internet, as the download speed may not be significantly impacted by the image size (4m52s).
However, reducing the size of an image is crucial for security reasons, as it minimizes the attack surface area by reducing the number of components shipped in the image, making it easier to patch and update, and decreasing dependencies in production (5m40s).
Removing unnecessary dependencies can also make the image easier to audit, which is essential for supply chain security and component governance (6m16s).
To reduce the size of an image, there are three main areas to focus on: the base image layer, the application layer, and the JVM runtime (6m33s).
For the base image layer, using slim versions of Linux distributions, such as Alpine, or building a custom Linux base image can be effective (6m52s).
For the application layer, only including necessary dependencies and breaking down the layer into separate layers for dependencies and application code can help reduce the image size (7m12s).
Using caching can also help, as it allows for reusing the dependencies layer if it hasn't changed, and only rebuilding the application layer (7m43s).
Running as a non-root user and using a JVM runtime that can shrink down to only include necessary components can also help reduce the image size (7m58s).
The JDK project's module mechanism can be used to create a JVM runtime with only the necessary modules for the application (8m16s).
Building a native image can also be an option for further reducing the image size (8m27s).
Examples of size differences between images include Ubuntu, dban, and Alpine, with Alpine being a smaller option, but requiring consideration of its musl libc library and potential compatibility issues (8m34s).
The JDK is compatible with Alpine, but there may be issues, so it's essential to test and ensure compatibility, and also note that commercial support from cloud vendors for Alpine might be limited (9m6s).
A classic Docker file for a Spring application can be improved by using a custom user instead of running as root, and by separating dependencies into different layers to optimize build and image download (9m30s).
Using the Spring Boot Maven plugin can automate the process of building an optimized Docker image (10m22s).
Creating a custom Java runtime with only the necessary bits can significantly reduce the JVM size, from 334 megabytes to 57 megabytes, and using GraalVM native image can reduce it further to less than 10 megabytes (10m27s).

Improving Startup Time

The JDK's class data sharing feature can improve startup time by half, and future projects like Project Loom and Project Crack aim to further improve startup time (11m4s).
Project Loom, led by Oracle, and Project Crack, led by AO Systems, are working on checkpoint restore technology, which can significantly improve startup time, but requires framework, library, and runtime support (11m34s).

JVM Defaults and Ergonomics

The JDK has default settings that can be optimized for better performance, and understanding these defaults is essential for optimizing Java applications (13m1s).
Java runtime stack has defaults that tend to be conservative and work for most applications, but these defaults can be optimized for specific use cases (13m9s).
JVM ergonomics can affect how the JVM runs, and environment settings can impact the JVM's behavior, such as the number of processors available (14m12s).
When running a Java application in Docker, the JVM may not see all available processors, depending on the Docker configuration (14m29s).
If a container is set to use a non-integer number of CPUs (e.g., 1.2 CPUs or 1200m in Kubernetes), the JVM will round up to the nearest whole number of processors (15m27s).
Memory settings can be tricky, and the JVM's garbage collector and heap size can be affected by the amount of memory available (15m58s).
The JVM's garbage collector can be selected based on the amount of memory available, and the heap size can be set to a percentage of the total memory (16m30s).
If the JVM is not properly tuned, it can result in a bad heap configuration, leading to performance issues (17m7s).
The G1 garbage collector is selected when the JVM has 2 CPUs and 1792 megabytes of memory (18m24s).
The JVM's default heap size is 50% for environments with less than 256 megabytes, and then it remains stable at 127 megabytes up to 512 megabytes, after which it is set to 25% of the available memory (19m40s).
The JVM's default settings were designed for shared environments, but in the container world, the JVM needs to be informed manually about the available resources (20m20s).
There are ongoing projects, involving companies like Microsoft, Google, and Oracle, to enhance the ergonomics and defaults of the JVM for container environments (20m40s).
Simply packaging a Java application as a JAR can result in wasted resources, and proper configuration is necessary to optimize performance (20m51s).
There are various garbage collectors in the JVM, including the ZGC, Shenandoah, and G1, and understanding their differences is important for optimization (21m0s).
The Absen GC is a garbage collector that does not collect anything, making it useful for benchmarking applications but not suitable for production environments (21m8s).
When running applications in the cloud, it's essential to consider that certain areas of memory, such as the Met space or code cache, will consume the same amount of memory regardless of the heap size (21m40s).
Configuring the JVM involves setting the heap size to 75% of the memory limit, and using memory calculators can simplify the process (22m20s).
Build packs, such as those provided by the Paketo project, can offer optimizations for containers and include memory calculators for building Java workloads (22m29s).

Optimizing Java Workloads on Kubernetes

Java applications on Kubernetes can be optimized by understanding JVM tuning, as setting xmx is not always necessary and can be calculated automatically with tools like Peto build packs (22m50s).
Horizontal Pod Autoscaler (HPA) is a common scaling solution, but it can be expensive and may not be the most effective approach, as it involves adding more computing power rather than optimizing resource usage (23m30s).
Vertical Pod Autoscaler (VPA) is an alternative scaling solution that allows pods to increase their resource allocation without restarting the container, but it requires the runtime to understand and take advantage of the additional resources (26m3s).
The JVM currently lacks the capability to effectively utilize VPA, but this is being worked on, and other runtimes like Rust may offer better performance in some cases (26m35s).
A company in Latin America migrated from Java to Rust to improve performance, but this could have been achieved through JVM tuning and resource redistribution, highlighting the importance of understanding JVM behavior (24m56s).
Google's Cube startup CPU boost is a feature that allows containers to access additional resources for a limited time, and can be used on Azure, offering a potential solution for optimizing JVM performance (26m51s).
Java applications on Kubernetes can initially require more CPU and memory to start up, but CPU usage can be reduced and stabilized after the JVM has completed its initial work, such as JIT compilation and optimization, allowing for more efficient resource allocation (27m16s).
The main issues people face when running Java applications on Kubernetes include limited memory and CPU throttling, which can cause latency and impact application performance (28m1s).
CPU throttling occurs when an application is given a limited amount of CPU time, but the JVM and other runtime components require additional CPU resources, leading to delays and increased latency (28m50s).
Setting a CPU limit can impact the JVM's performance, as it only allows the application to access a certain amount of CPU time within a specified period, and any additional CPU requests will be delayed until the next period (29m45s).
The JVM's garbage collector also requires CPU time to perform its tasks, which can further reduce the available CPU time for the application and increase latency (30m56s).
Understanding CPU throttling and its impact on the JVM is crucial to optimizing Java application performance on Kubernetes and ensuring that the application has sufficient resources to perform its tasks efficiently (29m28s).
The JVM's active processor count flag can be used to specify the number of processors available to the JVM, which can be different from the actual CPU limit, and this can be useful for IO-bound applications (32m0s).
Most microservices on Kubernetes are IO-bound, as they involve network requests and responses, and the active processor count flag can be used to optimize the JVM's thread pool size for such applications (32m40s).
Microsoft has provided recommendations for JVM settings on Kubernetes based on CPU limits, and these can be used as a starting point for optimizing JVM performance (32m59s).
When optimizing JVM performance, it's essential to have a clear goal in mind, such as throughput, latency, or cost, and to use the appropriate garbage collector for that goal (33m31s).
Resource distribution can significantly impact JVM performance, and reducing the number of replicas while increasing CPU and memory allocation can lead to improved performance and cost savings (34m14s).
A benchmark study showed that reducing replicas from six to two while increasing CPU and memory allocation improved throughput and latency while reducing costs (35m41s).
Resource redistribution can be applied to any language and is a viable strategy for optimizing performance and reducing costs in Kubernetes clusters (36m18s).
To optimize Java applications on Kubernetes, one approach is to merge a few pods to improve performance on those nodes, and then apply the rollout to more nodes with only one replica per node, which can be achieved by writing a Kubernetes operator (36m38s).
Another approach is to increase the node pool to taller VMs, increase the resource limits of the pods, and have only three replicas, which can provide spare resources for more workloads while keeping the cost the same (37m20s).
This approach allows for resiliency and provides the ability to do more with the same resources, making it beneficial from a cloud vendor perspective (37m48s).

A/B Performance Testing in Production

The concept of A/B performance testing can be applied to production performance, where a load balancer routes loads to different instances of the application configured differently, such as using different garbage collectors or resource limits (38m13s).
This approach can be used to test different configurations, such as smaller JVMs with more replicas or taller JVMs with lesser replicas, and can be easily implemented on Kubernetes (39m4s).
An example of A/B performance testing is using NGINX to route traffic to different instances of the application, and using round-robin or least connection patterns to distribute the load (39m11s).
Another scenario is to use different garbage collector configurations and tuning, such as using ergonomics default JVM, G1GC, and parallel GC, and configuring the deployment to use least connection or round-robin (39m41s).
By combining these approaches, it is possible to achieve optimal performance and resource utilization on Kubernetes, as shown in the Azure dashboard (40m7s).
A demonstration is shown using a CPU-bound emulation, specifically a prime factor test, to compare the performance of different deployments in a Kubernetes cluster (40m34s).
The test is run on different topologies, including 2x2 and 2x4, and the results are displayed in a dashboard, showing live metrics such as request rate and CPU usage (42m30s).
The demonstration highlights the ability to compare the performance of different deployments in production, including different garbage collectors, JVM tuning flags, and parameters, without affecting the application's functionality (43m43s).
The test is run on a container in a cluster, and the results are displayed in a dashboard, allowing for real-time comparison of the performance of different deployments (41m41s).
The demonstration also shows how to define roles for different deployments using environment variable names, which can be used to differentiate between deployments in the dashboard (43m4s).
The importance of testing in production is emphasized, as it allows for a more accurate evaluation of performance under real-world conditions, which may not be replicable in a lab environment (44m6s).
The demonstration is run on a live cluster, and the cost of running the demo is mentioned, highlighting the importance of efficient resource usage (44m18s).

Key Takeaways and Future Directions

The main takeaways for optimizing Java applications on Kubernetes include reducing the size of container images, focusing primarily on security rather than size, and addressing potential issues with loading images into nodes. (45m37s)
Startup time can be optimized by utilizing JVM features such as Class Data Sharing, which is available in Java 11, 17, and 21, and evaluating Project Crack and Project Loom for modernization. (46m18s)
Understanding the runtime defaults and capabilities of the JVM is crucial, and observing memory, CPU, garbage collection, and J compilation in production can help identify areas for improvement. (46m39s)
It is essential to understand the impact of resource constraints on the runtime stack and ensure sufficient resources are allocated for proper behavior. (46m59s)
Horizontal scaling is not a silver bullet, and vertical scaling should also be considered to optimize performance. (47m16s)
Performance 20 in production is expected to be a significant focus area, and utilizing staging environments for testing can be beneficial. (47m22s)
Microsoft is researching the addition of CRA support in the OpenJDK distribution, and while there are some complexities, frameworks like Spring have implemented checkpoint restore flows. (48m3s)
Framework teams are required to build capabilities into the application framework for certain features, and efforts are being made to explore this possibility (49m10s).
JSR (Java Specification Request) does not specifically focus on enhancing the JVM for performance issues, but there are ongoing projects by Google, Microsoft, and Oracle to make the JVM heap dynamic, allowing it to grow and shrink as needed (49m29s).
A dynamic JVM heap would enable in-place scaling and vertical scaling capabilities to be taken advantage of by the JVM (49m51s).
Oracle is working on the ZGC (Z Garbage Collector) to address issues with garbage collectors defining memory areas and managing objects (50m5s).
Oracle is also working on adaptable heap sizing for the ZGC, while Google has done work on the G1 GC, and Microsoft is exploring serial GC for dynamic heap management (50m17s).