Learn how eBPF transforms your infrastructure. Register for Peter Zaitsev's webinar 📣

You’re overpaying for OpenTelemetry’s verbosity by at least 30%

Nikolay Sivko

Oct 10, 2023

While working on Coroot’s Distributed Tracing capability, we set up an environment with numerous apps instrumented with OpenTelemetry to generate tracing data for testing purposes.

Coroot uses the OpenTelemetry collector and its clickhouse-exporter to store traces and logs to ClickHouse. Recently, I explored the data compression ratio of tracing data in ClickHouse and stumbled upon something quite interesting. If I were into conspiracy theories, I might think it’s a plot by observability vendors:)

Let’s start from the very beginning and look at the otel_traces table in ClickHouse:

CREATE TABLE default.otel_traces
(
    `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TraceId` String CODEC(ZSTD(1)),
    `SpanId` String CODEC(ZSTD(1)),
    `ParentSpanId` String CODEC(ZSTD(1)),
    `TraceState` String CODEC(ZSTD(1)),
    `SpanName` LowCardinality(String) CODEC(ZSTD(1)),
    `SpanKind` LowCardinality(String) CODEC(ZSTD(1)),
    `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `Duration` Int64 CODEC(ZSTD(1)),
    `StatusCode` LowCardinality(String) CODEC(ZSTD(1)),
    `StatusMessage` String CODEC(ZSTD(1)),
    `Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)),
    `Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)),
    `Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    `Links.TraceId` Array(String) CODEC(ZSTD(1)),
    `Links.SpanId` Array(String) CODEC(ZSTD(1)),
    `Links.TraceState` Array(String) CODEC(ZSTD(1)),
    `Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_duration Duration TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
TTL toDateTime(Timestamp) + toIntervalDay(7)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

This table stores tracing spans of the last 7 days (TTL). Now, let’s look at its size and compression ratio:

SELECT
    formatReadableSize(sum(data_compressed_bytes) AS csize) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / csize, 1) AS compression_ratio,
    sum(rows) AS rows
FROM system.parts
WHERE table = 'otel_traces'

Query id: 003a01c0-6c62-4284-8261-15b1fa5388c8

┌─compressed─┬─uncompressed─┬─compression_ratio──┬──────rows─┐
│ 20.26 GiB  │ 118.71 GiB   │                5.9 │ 410405258 │
└────────────┴──────────────┴────────────────────┴───────────┘

5.9 is not bad, but let’s break down the ratio for each column individually:

SELECT
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS csize) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / csize, 2) AS compression_ratio,
    sum(rows) AS rows,
    round(usize / rows, 2) AS avg_row_size
FROM system.parts_columns
WHERE table = 'otel_traces'
GROUP BY column
ORDER BY usize DESC

Query id: d0d1d09a-247b-4f57-9402-0a94e3f583fa

┌─column─────────────┬─compressed─┬─uncompressed─┬─compression_ratio─┬──────rows─┬─avg_row_size─┐
│ SpanAttributes     │ 5.17 GiB   │ 61.34 GiB    │             11.86 │ 409743258 │       160.75 │
│ ResourceAttributes │ 626.61 MiB │ 20.43 GiB    │             33.39 │ 409743258 │        53.55 │
│ TraceId            │ 6.37 GiB   │ 12.58 GiB    │              1.97 │ 409743258 │        32.96 │
│ SpanId             │ 3.33 GiB   │ 6.48 GiB     │              1.94 │ 409743258 │        16.98 │
│ Events.Timestamp   │ 38.14 MiB  │ 3.11 GiB     │             83.51 │ 409743258 │         8.15 │
│ Links.TraceId      │ 15.75 MiB  │ 3.07 GiB     │            199.87 │ 409743258 │         8.06 │
│ Duration           │ 1.39 GiB   │ 3.05 GiB     │              2.19 │ 409743258 │         7.99 │
│ Timestamp          │ 1.95 GiB   │ 3.05 GiB     │              1.56 │ 409743258 │         7.99 │
│ ParentSpanId       │ 1.30 GiB   │ 2.78 GiB     │              2.14 │ 409743258 │         7.28 │
│ StatusMessage      │ 707.93 KiB │ 392.67 MiB   │            567.98 │ 409743258 │            1 │
│ SpanKind           │ 1.06 MiB   │ 391.77 MiB   │            370.24 │ 409743258 │            1 │
│ StatusCode         │ 4.72 MiB   │ 391.77 MiB   │             82.95 │ 409743258 │            1 │
│ SpanName           │ 487.27 KiB │ 391.18 MiB   │            822.06 │ 409743258 │            1 │
│ ServiceName        │ 406.06 KiB │ 391.10 MiB   │            986.27 │ 409743258 │            1 │
│ TraceState         │ 274.75 KiB │ 390.31 MiB   │           1454.67 │ 409743258 │            1 │
│ Events.Attributes  │ 8.53 MiB   │ 157.04 MiB   │             18.41 │ 409743258 │          0.4 │
│ Links.SpanId       │ 6.87 MiB   │ 13.37 MiB    │              1.95 │ 409743258 │         0.03 │
│ Events.Name        │ 722.79 KiB │ 8.10 MiB     │             11.47 │ 409743258 │         0.02 │
│ Links.Attributes   │ 8.59 KiB   │ 6.29 MiB     │            750.15 │ 409743258 │         0.02 │
│ Links.TraceState   │ 2.38 KiB   │ 805.57 KiB   │            338.91 │ 409743258 │            0 │
└────────────────────┴────────────┴──────────────┴───────────────────┴───────────┴──────────────┘

Half of the uncompressed size (61.34Gb out of 120Gb) is attributed to the SpanAttributes column. Let’s take a closer look at the specific span attributes. For a quick estimate of the distribution, let’s use the last hour’s data instead of the entire column.

SELECT
    arrayJoin(mapKeys(SpanAttributes)) AS key,
    formatReadableSize(sum(length(SpanAttributes[key]) + length(key)) AS size) AS uncompressed
FROM otel_traces
WHERE Timestamp > (now() - toIntervalHour(1))
GROUP BY key
ORDER BY size DESC
LIMIT 10

Query id: e56a538f-b36a-48c9-aff8-a9cff6828a3d

┌─key────────────────┬─uncompressed─┐
│ otel.scope.name    │ 70.62 MiB    │
│ http.url           │ 43.27 MiB    │
│ otel.scope.version │ 33.00 MiB    │
│ net.peer.name      │ 19.58 MiB    │
│ http.status_code   │ 18.63 MiB    │
│ http.method        │ 14.18 MiB    │
│ db.statement       │ 13.86 MiB    │
│ net.peer.port      │ 12.29 MiB    │
│ http.user_agent    │ 8.70 MiB     │
│ net.sock.peer.addr │ 7.41 MiB     │
└────────────────────┴──────────────┘

That’s odd. I was expecting attributes like http.url or http.user_agent to occupy most of the table space, but it turns out it’s OpenTelemetry’s auxiliary data. But why do we even need these attributes, and are they really valuable? Here’s what the OpenTelemetry documentation says:

otel.scope.name: this is the name of the instrumentation scope, like io.opentelemetry.contrib.mongodb.
otel.scope.version: it’s the version of the instrumentation scope, such as 1.0.0.

OK, looks harmless to me. Let’s go ahead and calculate the total size of these attributes across the entire dataset:

SELECT
    formatReadableSize(sum(
        length(SpanAttributes['otel.scope.name']) +
        length('otel.scope.name') +
        length(SpanAttributes['otel.scope.version']) +
        length('otel.scope.version')
    ) AS size) AS uncompressed
FROM otel_traces

Query id: c5dfbd59-fe6f-458b-98a5-f523d90ca803

┌─uncompressed─┐
│ 35.87 GiB    │
└──────────────┘

SAY WHAT?!?! This makes up 30% of the total uncompressed table size. From my point of view, these attributes aren’t doing anyone any favors, and it’s a bit crazy to store them for EVERY tracing span and log record.

Now, you might say these attributes hardly ever change and don’t take up much space when compressed. That’s true! However, the catch is that most cloud observability platforms charge based on the amount of data you ingest, not what you store.

In simpler terms, you’re wasting about 30% of your tracing and logging budget on ingesting these useless attributes. Here’s another intriguing fact: observability vendors don’t incur any storage costs for these attributes, thanks to their perfect compression ratio. It’s quite a clever scheme, isn’t it ? 🙂

At Coroot, our main goal is to turn telemetry data into useful insights and help you troubleshoot. That’s why in our cloud we only charge for AI-based Root Cause Analysis, not for storage (or ingestion:)

Try Coroot Cloud now (14-day free trial is available) or follow the instructions on our Getting started page to install Coroot Community Edition.

If you like Coroot, give us a ⭐ on GitHub️.

Any questions or feedback? Reach out to us on Slack.

You’re overpaying for OpenTelemetry’s verbosity by at least 30%

Nikolay Sivko

Share this post

Related posts

filetop – eBPF Command Line Tools

gethostlatency – eBPF Command Line Tools

runqlat and runqslower – eBPF command line tools