Skip to main content
    All postsEngineering

    A deep dive into Service-to-Service communications in Kubernetes

    Nikolay SivkoNikolay Sivko
    June 12, 202310 min read

    When you deploy an application to a Kubernetes cluster, one of the essential steps is creating a Service. It enables other apps within the cluster or external clients to access this app through the network. A Service in Kubernetes is a straightforward abstraction, but like any abstraction, it adds complexity into the system and can make troubleshooting more challenging.

    Motivation

    The motivation behind writing this article stems from a specific problem we encountered while developing Coroot, an open source observability tool. Coroot leverages eBPF to build a Service Map that covers 100% of your system without the need of modifying your application code.

    To build a service map we need to discover how containers in a cluster communicate with each other. Coroot’s agent captures all outbound TCP connections of every container. However, when a container connects to another app through a Kubernetes Service, it becomes challenging to accurately determine the destination container address of such connections.

    In this article, we’ll look under the hood at how Kubernetes load balancing works using this seemingly simple task as an example.

    Built-in Kubernetes load balancing based on iptables

    Let’s deploy an app (nginx) with 2 replicas and a service:

    $ kubectl create deployment nginx --image=nginx --replicas=2
    deployment.apps/nginx created
    $ kubectl expose deployment nginx --port=80
    service/nginx exposed
    $ kubectl get pods -l app=nginx -o wide
    NAME                     READY   STATUS    RESTARTS   AGE   IP           NODE   NOMINATED NODE   READINESS GATES
    nginx-748c667d99-pdppx   1/1     Running   0          50s   10.42.0.12   lab    <none>           <none>
    nginx-748c667d99-9h6gr   1/1     Running   0          50s   10.42.0.11   lab    <none>           <none>
    $ kubectl get services -l app=nginx
    NAME    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
    nginx   ClusterIP   10.43.209.15   <none>        80/TCP    17s
    $ kubectl get endpoints -l app=nginx
    NAME    ENDPOINTS                     AGE
    nginx   10.42.0.11:80,10.42.0.12:80   114s
    

    Now, let’s run another pod and connect to an nginx instance through the service:

    $ kubectl run --rm client -it --image arunvelsriram/utils sh
    $ telnet nginx 80
    Trying 10.43.209.15... ← Service IP
    Connected to nginx.default.svc.cluster.local.
    Escape character is '^]'.
    

    From the client pod’s perspective, it connected to 10.43.209.15:80

    $ netstat -an |grep EST
    tcp        0      0 10.42.0.19:50856        10.43.209.15:80         ESTABLISHED
    

    We are aware that the client is in fact connected to one of our nginx pods. However, the question arises: how did this happen, and how can we determine the specific nginx pod the client is connected to?

    When the service was created, Kubernetes (specifically kube-proxy) established iptables rules to distribute traffic randomly among the available nginx pods. These rules change the destination IP address of incoming traffic to the IP address of one of the nginx pods.

    A deep dive into Service-to-Service communications in Kubernetes

    This load balancing approach relies on the conntrack table (a part of Linux network stack) to keep track of the connection states. The conntrack table maintains information about established connections, including the source and destination IP addresses and ports.

    Hence, we can identify the translated address of the connection in the conntrack table within the root network namespace on the node:

    root@lab:~# conntrack -L |grep 50856
    tcp 6 86392 ESTABLISHED src=10.42.0.19 dst=10.43.209.15 sport=50856 dport=80 src=10.42.0.12 dst=10.42.0.19 sport=80 dport=50856 [ASSURED] use=1
    

    When a packet is transmitted from the server to the client, the Linux kernel performs a lookup in the conntrack table to identify the corresponding connection. This is the reason why the second IP:PORT pair in the table entry appears in reverse order.

    As you can see, in this particular scenario, the connection was established to 10.42.0.12:80 (nginx-pod-2).

    Istio Service Mesh

    Now let’s see how the same scenario works with Istio Service Mesh.

    A service mesh achieves better control over service-to-service communications by implementing a dedicated infrastructure layer that intercepts and manages network traffic between services. It does this by using sidecar proxies, such as Envoy, that are deployed alongside each application instance.

    As Istio is already installed in the cluster, let’s proceed by enabling automatic sidecar injection within our cluster:

    $ kubectl label namespace default istio-injection=enabled --overwrite
    namespace/default labeled
    

    Now let’s run a client pod and connect to the nginx service:

    $ kubectl run --rm client -it --image arunvelsriram/utils sh
    $ telnet nginx 80
    Trying 10.43.209.15... ← Service IP
    Connected to nginx.default.svc.cluster.local.
    Escape character is '^]'.
    

    From the client’s perspective, nothing has changed. It is connected to the service IP as before:

    $ netstat -an|grep EST
    tcp        0      0 10.42.0.19:32840        10.43.209.15:80         ESTABLISHED
    

    The iptables rules continue to exist in the root network namespace, however there is no relevant record in the root conntrack table anymore. This occurs because the outbound packets from the client are now intercepted directly within the current network namespace of the pod.

    When Istio injects a sidecar proxy into a pod, its component, pilot-agent, configures iptables to redirect all outbound traffic to the Envoy proxy within the same network namespace for further processing and control.

    A deep dive into Service-to-Service communications in Kubernetes

    Knowing this, we can locate the relevant conntrack table entry within the network namespace of the client pod:

    root@lab:~# nsenter -t <telnet_pid> -n conntrack -L |grep 32840
    tcp 6 428676 ESTABLISHED src=10.42.0.19 dst=10.43.209.15 sport=32840 dport=80 src=127.0.0.1 dst=10.42.0.19 sport=15001 dport=32840 [ASSURED] use=1
    

    Now we can see that the actual connection destination is 127.0.0.1:15001 (envoy). As Envoy establishes new connections to the service endpoints, it’s not feasible to trace the initial connection directly to the final destination.

    However, from Coroot’s perspective, this is not a problem. Its agent captures and traces all outbound connections, including those from sidecar proxies. As a result, there is no difference on service maps between connections proxied by Envoy and those that are not.

    Cilium as a kube-proxy/iptables replacement

    Cilium is recognized as one of the most powerful network plugins for Kubernetes. It not only provides basic network and security capabilities but also offers an eBPF-based alternative to Kubernetes’ default iptables-based load balancing mechanism.

    To set up a k3s cluster with Cilium, I used the following commands:

    # curl -sfL https://get.k3s.io | 
      INSTALL_K3S_EXEC='--flannel-backend=none --disable-network-policy --disable traefik --disable-kube-proxy' 
      sh -
    $ helm repo add cilium https://helm.cilium.io/
    $ helm repo update
    $ helm install cilium cilium/cilium 
      --set k8sServiceHost=<api_server_ip> 
      --set k8sServicePort=6443
      --set global.containerRuntime.integration="containerd" 
      --set global.containerRuntime.socketPath="/var/run/k3s/containerd/containerd.sock" 
      --set global.kubeProxyReplacement="strict" 
      --namespace kube-system
    

    Now, let’s perform out experiment with the nginx service and telnet again:

    $ kubectl run --rm client -it --image arunvelsriram/utils sh
    $ telnet nginx 80
    Trying 10.43.49.44... ← Service IP
    Connected to nginx.default.svc.cluster.local.
    Escape character is '^]'.
    

    From the client’s perspective, nothing has changed. It is connected to the service IP as before:

    $ netstat -an|grep EST
    tcp        0      0 10.0.0.65:42014         10.43.49.44:80          ESTABLISHED
    

    As we have completely disabled kube-proxy, there is no entry related to this connection in the conntrack table.

    A deep dive into Service-to-Service communications in Kubernetes

    Cilium intercepts all traffic directed towards service IPs and distributes it among the corresponding pods at the eBPF level. In order to perform reverse network address translation, Cilium maintains its own connection tracking table on top of eBPF maps. To access and examine this table, we can use the cilium CLI tool within the cilium pod:

    $ kubectl  exec -ti <cilium_pod> -n kube-system -- cilium bpf ct list global|grep 42014
    TCP OUT 10.43.49.44:80 -> 10.0.0.65:42014 service expires=30289 RxPackets=0 RxBytes=7 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x13 LastTxReport=8689 Flags=0x0012 [ TxClosing SeenNonSyn ] RevNAT=6 SourceSecurityID=0 IfIndex=0
    TCP OUT 10.0.0.65:42014 -> 10.0.0.80:80 expires=8699 RxPackets=3 RxBytes=206 RxFlagsSeen=0x13 LastRxReport=8689 TxPackets=3 TxBytes=206 TxFlagsSeen=0x13 LastTxReport=8689 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=6 SourceSecurityID=17722 IfIndex=0
    TCP IN 10.0.0.65:42014 -> 10.0.0.80:80 expires=8699 RxPackets=3 RxBytes=206 RxFlagsSeen=0x13 LastRxReport=8689 TxPackets=3 TxBytes=206 TxFlagsSeen=0x13 LastTxReport=8689 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=17722 IfIndex=0
    

    As seen, in this particular case, our client pod is connected to 10.0.0.80:80 (nginx-pod-2).

    Returning to Coroot’s agent, it automatically detects the presence of Cilium in the cluster and leverages its conntrack table to accurately determine the actual destination of every connection.

    Bonus track: Docker Swarm

    Kubernetes: My network topology can get pretty complicated!
    Docker Swarm: Hold my beer!

    Let’s explore how Service-to-Service communications work in a Docker Swarm cluster just for fun!

    version: "3.8"
    services:
      nginx:
        image: nginx
        ports:
          - target: 80
            published: 80
            protocol: tcp
        deploy:
          mode: replicated
          replicas: 2
      client:
        image: arunvelsriram/utils
        command: ['sleep', '100500']
        deploy:
          mode: replicated
          replicas: 1
    

    By applying this configuration, Docker creates a group of containers and establishes an overlay network that incorporates a load balancer.

    $ docker stack deploy -c demo.yaml demo
    Creating network demo_default
    Creating service demo_nginx
    Creating service demo_client
    $ docker ps
    CONTAINER ID   IMAGE                        COMMAND                  CREATED        STATUS        PORTS     NAMES
    97bf67fc9925   arunvelsriram/utils:latest   "sleep 100500"           10 hours ago   Up 10 hours             demo_client.1.62e6nufzfd4t1g8kde0r40r0h
    f0bd5fc75c23   nginx:latest                 "/docker-entrypoint.…"   2 days ago     Up 2 days     80/tcp    demo_nginx.2.c2q5ukp66jawe1z0cuyvcfthr
    ed101e5d06f7   nginx:latest                 "/docker-entrypoint.…"   2 days ago     Up 2 days     80/tcp    demo_nginx.1.uqz8vgp8b7lywoyi1f6io56w7
    $ docker network inspect demo_default
    [
        {
            "Name": "demo_default",
            "Id": "trjfagvgjgu7iwbzp6ivuvcub",
            "Containers": {
                "97bf67fc9925ae55c28b16230defad8afe13e54af7bdc24f5b7e05eb8a8eef14": {
                    "Name": "demo_client.1.62e6nufzfd4t1g8kde0r40r0h",
                    "EndpointID": "c4af5eb9852e4f2d13086b4ea496bc52cc88e26d5f447f4bf6a7cd1126629d48",
                    "MacAddress": "02:42:0a:00:02:09",
                    "IPv4Address": "10.0.2.9/24",
                    "IPv6Address": ""
                },
                "ed101e5d06f7c8757187f23b84db415627860f346ea6c585a8e2455e102f024b": {
                    "Name": "demo_nginx.1.uqz8vgp8b7lywoyi1f6io56w7",
                    "EndpointID": "81bb68161d02c55f1315a1b3462805ca21f10892a00c0a494d39fa5ceba81e5b",
                    "MacAddress": "02:42:0a:00:02:03",
                    "IPv4Address": "10.0.2.3/24",
                    "IPv6Address": ""
                },
                "f0bd5fc75c2390dcda9efb9a36f099d668583260b05c48933eca5bd4f0a567b9": {
                    "Name": "demo_nginx.2.c2q5ukp66jawe1z0cuyvcfthr",
                    "EndpointID": "f5a50e587f467fc02fb22855e298867a0c86f89da4a78f3608ea0324df708190",
                    "MacAddress": "02:42:0a:00:02:04",
                    "IPv4Address": "10.0.2.4/24",
                    "IPv6Address": ""
                },
                "lb-demo_default": {
                    "Name": "demo_default-endpoint",
                    "EndpointID": "490a5b5bc526ee4aaca34c5a0d443da324d834b20611127769ee3bcff578e0a7",
                    "MacAddress": "02:42:0a:00:02:05",
                    "IPv4Address": "10.0.2.5/24",
                    "IPv6Address": ""
                }
            },
            ...
        }
    ]
    

    The load balancer (lb-demo_default) is a dedicated network namespace that has been configured to distribute traffic among containers using IPVS.

    A deep dive into Service-to-Service communications in Kubernetes

    Now let’s run telnet from the client container to the nginx service:

    $ docker exec -ti demo_client.1.62e6nufzfd4t1g8kde0r40r0h telnet nginx 80
    Trying 10.0.2.2... ← Service IP
    Connected to nginx.
    Escape character is '^]'.
    
    # nsenter -t <telnet_pid> -n netstat -an |grep EST
    tcp        0      0 10.0.2.9:51504          10.0.2.2:80            ESTABLISHED
    

    The IP address 10.0.2.2 is assigned to the load balancer network namespace:

    # nsenter --net=/run/docker/netns/lb_trjfagvgj ip a l
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    47: eth0@if48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
        link/ether 02:42:0a:00:02:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.0.2.5/24 brd 10.0.2.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet 10.0.2.2/32 scope global eth0
           valid_lft forever preferred_lft forever
        inet 10.0.2.6/32 scope global eth0
           valid_lft forever preferred_lft forever
    

    As IPVS also maintains the connection states in the conntrack table, we can locate the actual destination of our connection there:

    # nsenter --net=/run/docker/netns/lb_trjfagvgj conntrack -L |grep 51504
    tcp 6 431940 ESTABLISHED src=10.0.2.9 dst=10.0.2.2 sport=51504 dport=80 src=10.0.2.4 dst=10.0.2.5 sport=80 dport=51504 [ASSURED] use=1
    

    In this case telnet is connected to 10.0.2.4:80 (nginx-2).

    Coroot’s agent operates in a similar way. It first identifies the appropriate overlay network for each container and then locates the load balancer network namespace. Afterwards, it performs a conntrack table lookup to determine the actual destination of a given connection.

    Conclusion

    As we have seen, there are multiple ways to organize Service-to-Service communications in Kubernetes. Each approach has its own benefits and drawbacks. However, it is essential to understand how your chosen method works and maintain observability of your system.

    Coroot seamlessly integrates with all of these methods, providing you with a comprehensive map of your services within minutes after installation.

    Follow the instructions on our Getting started page to try Coroot now. Not ready to Get started with Coroot? Check out our live demo.

    If you like Coroot, give us a ⭐ on GitHub️ or share your experience on G2.

    Any questions or feedback? Reach out to us on Slack.

    Try Coroot Free

    Get full-stack observability in minutes with zero code changes. eBPF-powered monitoring with AI-guided root cause analysis.