Menu
The most effective method for evaluating an observability tool is to introduce a failure intentionally into a fairly complex system, and then observe how quickly the tool detects the root cause.
We’ve built Coroot based on the belief that having high-quality telemetry data enables us to automatically pinpoint the root causes for over 80% of outages with precision. But you don’t have to take our word for itâput it to the test yourself!
To simplify this task for you, we created a Helm chart that deploys:
Add Coroot’s Helm charts repo
helm repo add coroot https://coroot.github.io/helm-charts helm repo update coroot
helm upgrade --install demo coroot/otel-demo
helm upgrade --install demo \ --set chaos-mesh.chaosDaemon.runtime=containerd \ --set chaos-mesh.chaosDaemon.socketPath=/run/containerd/containerd.sock \ coroot/otel-demo
helm upgrade --install demo \ --set chaos-mesh.chaosDaemon.runtime=containerd \ --set chaos-mesh.chaosDaemon.socketPath=/run/k3s/containerd/containerd.sock \ coroot/otel-demo
helm upgrade --install demo \ --set chaos-mesh.chaosDaemon.runtime=crio \ --set chaos-mesh.chaosDaemon.socketPath=/var/run/crio/crio.sock \ coroot/otel-demo
â Â Currently, Coroot doesn’t support Docker-in-Docker environments like MiniKube due to eBPF limitations
The chart sets up fixed NodePort services, enabling access to applications from outside the cluster using the IP address of any cluster node
Here’s what the OpenTelemetry Demo looks like in Coroot:
So far, everything is going well in the cluster, no problems have occurred:
Most posts on testing observability tools stop at this step. However, we will take it a step further by introducing failures into the system and observing the system’s behavior “in the middle of an outage”.
Now, let’s set up a chaos experiment in Chaos Mesh. We’ll simulate a scenario where the cart service loses connectivity with the Redis database.
You can create a chaos experiment using Chaos Mesh Dashboard at http://<YOUR_NODE_IP>:30003
Here is the specification of my experiment:
kind: NetworkChaos apiVersion: chaos-mesh.org/v1alpha1 metadata: namespace: default name: redis-net-partition spec: selector: namespaces: - default labelSelectors: app.kubernetes.io/component: redis mode: all action: partition duration: 10m direction: to target: selector: namespaces: - default labelSelectors: app.kubernetes.io/component: cartservice mode: all
If we navigate to the Demo app (http://<YOUR_NODE_IP>:30005), we’ll notice that we are no longer able to add an item to the cart. Now, let’s examine how this failure appears in Coroot.
The Application Health Summary indicates that cartservice is experiencing network connectivity issues with one of its upstream services. By clicking on this issue, we can delve into the details.
As seen here, Coroot has pinpointed the specific Pods encountering the network connectivity issue.
Additionally, it indicates that there are errors in the frontend service logs.
Thanks to Coroot’s capability to extract repeated patterns from logs, we can quickly identify that there are three unique errors in the logs, all of which are related to the Redis database.
After gathering statistics over a longer period of time, Coroot detected that emailservice has a memory leak and is being killed by the OOM Killer every 1.5 hours.
In this post, I’ve only shown one way things can go wrong, but Chaos Mesh lets you test for many more issues. If you try this with Coroot, please let us know what you find. Together, we can make Coroot better and help many engineering teams investigate outages faster!
Try Coroot Cloud now (14-day free trial is available) or follow the instructions on our Getting started page to install Coroot Community Edition.
If you like Coroot, give us a â on GitHubď¸.
Any questions or feedback? Reach out to us on Slack.