Introduction and Context
TeamSystem Pay is a fintech company under the TeamSystem Group, specializing in digital payments and open banking services. Originally developed to meet internal demands, TeamSystem Pay has evolved, facing the complexities of a cloud-native setting and the strategic considerations inherent in this transition. This piece, written together with Davide Pellizzato, Head Of R&D at TeamSystem Payments, delves into the company’s innovative strategy for overcoming communication hurdles within Kubernetes clusters using gRPC.
1. The gRPC Dilemma
Choosing a communication protocol is a strategic decision in cloud-native ecosystems. TeamSystem Pay’s journey began with a meticulous evaluation of available options, leading to the adoption of gRPC.
The advantages of using gRPC include:
- Lower serialization overhead
- Automatic type checking
- Formalized APIs
- Reduced TCP management overhead
As the team explored the intricacies of communication protocols, it became evident that traditional REST services were no longer sufficient for their expanding cloud-native ecosystem. The allure of gRPC, with its streamlined features and efficient data serialization, became apparent. The situation asked for a protocol that not only ensured reliable communication but also optimized performance in a dynamic, microservices-driven environment.
But of course, each decision comes with a specific set of trade-offs that have to be addressed.
2. The Unforeseen Challenge: Kubernetes and gRPC Communication
Embracing gRPC introduced a dance between the service mesh of Kubernetes and the protocol itself. The team noticed the disruption caused by gRPC’s reliance on HTTP/2, which employs a single long-lived TCP connection for multiplexing requests. This, although beneficial for reducing connection management overhead, posed a significant disruption to Kubernetes’ standard load balancing rhythm
“When you scale up the server, no client will automatically connect to the new server instances as clients would simply maintain existing connections,” the TeamSystem Pay team explains.
So, how to maintain harmony between Kubernetes’ connection-level load balancing and gRPC’s unique demands? The team then proceeded to examine the best strategies possible to tackle the challenge.
3. The Possible Solutions
a) Navigating with Manual Balancing Pools
One idea was to manually maintain “balancing pools” in the Kubernetes cloud native environment.
When working with Kubernetes, the notion of “balancing pools” revolves around the strategic orchestration of load balancing mechanisms, aiming to distribute network traffic among multiple instances or pods of a service. The underlying objective is to achieve a harmonious distribution of workload, optimizing resource utilization while bolstering high availability.
In e-commerce platforms, this approach usually works in this way: As traffic surges during peak shopping hours, the load balancing features of Kubernetes intelligently scales the pod instances associated with the checkout service. This ensures a fluid and responsive shopping experience for users, effortlessly navigating the dynamic dance of load balancing within the Kubernetes cluster.
In the end, the team considered this to be an excessively complex road, particularly in a setting where Kubernetes’ dynamic nature demands more adaptive and automated solutions.
b) Into the DNS Enigma: Headless Services
DNS and headless services in Kubernetes also emerged as an alternative solution for managing the gRPC load balancing pool.
The mentioned approach—dynamically creating multiple A records in the DNS entry—enables advanced gRPC clients to autonomously handle the load balancing pool. In this scenario, gRPC clients can use DNS to discover the IP addresses of all pods associated with the service and distribute requests accordingly. This mechanism is particularly valuable for scenarios where direct communication with specific pods is crucial.
However, this strategy introduces its own set of challenges. As headless services rely heavily on the capabilities of the gRPC client, it demands a consistent ability to manage connection pools across different programming languages.
While headless services offer a unique and dynamic way of handling communication within a Kubernetes cluster, they place a greater burden on the gRPC client to navigate and manage the intricacies of load balancing.
c) Lightweight Proxy Emergence
Last but not least, the team examined another option based on Lightweight Proxy. This strategy works with an “intermediary” that establishes connections to each destination service, effectively managing the distribution of requests across these connections. Unlike the other approaches, the Lightweight Proxy brought a great deal of flexibility to the table.
Unlike headless services relying on DNS or manually maintained balancing pools, the Lightweight Proxy takes an active role in connection management, offering a centralized point for load balancing decisions.
This solution seemed to be the right answer for TeamSystem Pay’s needs, as the flexibility it introduces across programming languages provides a bridge between different services.
In the next section, we’ll explain the main strengths and steps taken to implement it.
4. Enabling The Lightweight Proxy Solution
So, how did TeamSystem Pay bring the Lightweight Proxy solution to life? Let’s dive into the technical benefits it brings to the table and how seamlessly it integrated with the project’s requirements:
- Independence from Programming Languages and Clients:
One of the key strengths of the Lightweight Proxy approach is its independence from programming languages and clients. This means that it can integrate into existing codebases regardless of the language used.
- Easy Integration into Existing Codebases:
The Lightweight Proxy’s design prioritizes ease of integration. This approach ensures that existing codebases can leverage the benefits of load balancing without undergoing significant modifications. This adaptability simplifies the implementation process, making it more accessible for teams with diverse technology stacks.
- Flexibility for Additional Logic:
Perhaps one of the most notable advantages is the flexibility introduced by the Lightweight Proxy. Beyond basic load balancing, it allows for the incorporation of additional logic within the proxy itself. This empowers advanced features, such as request routing based on pod performance and observability capabilities. The proxy becomes an intelligent intermediary capable of making informed decisions about where to route requests based on various performance metrics.
5. Main Takeaways
In conclusion, TeamSystem Pay’s journey offers valuable insights for cloud-native environments and projects grappling with similar complexities. In the realm of technology and cloud-native environments, solutions are rarely one-size-fits-all. A proficient team must meticulously assess the unique requirements and obstacles of each project to determine and implement the most suitable approach. This process inherently involves making decisions and acknowledging inevitable trade-offs.
In this case, the Lightweight Proxy approach offered a comprehensive solution to Kubernetes load balancing challenges when working with gRPC, as it provided a centralized and adaptable mechanism for managing connections, ensuring that the distribution of requests is not only language-agnostic but also customizable and extensible. The flexibility to incorporate additional logic within the proxy also enhanced the overall control and optimization of inter-service communication.