Menu
While working on Coroot’s Distributed Tracing capability, we set up an environment with numerous apps instrumented with OpenTelemetry to generate tracing data for testing purposes.
Coroot uses the OpenTelemetry collector and its clickhouse-exporter to store traces and logs to ClickHouse. Recently, I explored the data compression ratio of tracing data in ClickHouse and stumbled upon something quite interesting. If I were into conspiracy theories, I might think it’s a plot by observability vendors:)
Let’s start from the very beginning and look at the otel_traces table in ClickHouse:
CREATE TABLE default.otel_traces ( `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)), `TraceId` String CODEC(ZSTD(1)), `SpanId` String CODEC(ZSTD(1)), `ParentSpanId` String CODEC(ZSTD(1)), `TraceState` String CODEC(ZSTD(1)), `SpanName` LowCardinality(String) CODEC(ZSTD(1)), `SpanKind` LowCardinality(String) CODEC(ZSTD(1)), `ServiceName` LowCardinality(String) CODEC(ZSTD(1)), `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)), `SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)), `Duration` Int64 CODEC(ZSTD(1)), `StatusCode` LowCardinality(String) CODEC(ZSTD(1)), `StatusMessage` String CODEC(ZSTD(1)), `Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)), `Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)), `Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)), `Links.TraceId` Array(String) CODEC(ZSTD(1)), `Links.SpanId` Array(String) CODEC(ZSTD(1)), `Links.TraceState` Array(String) CODEC(ZSTD(1)), `Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)), INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1, INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_duration Duration TYPE minmax GRANULARITY 1 ) ENGINE = MergeTree PARTITION BY toDate(Timestamp) ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId) TTL toDateTime(Timestamp) + toIntervalDay(7) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
SELECT formatReadableSize(sum(data_compressed_bytes) AS csize) AS compressed, formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, round(usize / csize, 1) AS compression_ratio, sum(rows) AS rows FROM system.parts WHERE table = 'otel_traces' Query id: 003a01c0-6c62-4284-8261-15b1fa5388c8 ┌─compressed─┬─uncompressed─┬─compression_ratio──┬──────rows─┐ │ 20.26 GiB │ 118.71 GiB │ 5.9 │ 410405258 │ └────────────┴──────────────┴────────────────────┴───────────┘
5.9 is not bad, but let’s break down the ratio for each column individually:
SELECT column, formatReadableSize(sum(column_data_compressed_bytes) AS csize) AS compressed, formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, round(usize / csize, 2) AS compression_ratio, sum(rows) AS rows, round(usize / rows, 2) AS avg_row_size FROM system.parts_columns WHERE table = 'otel_traces' GROUP BY column ORDER BY usize DESC Query id: d0d1d09a-247b-4f57-9402-0a94e3f583fa ┌─column─────────────┬─compressed─┬─uncompressed─┬─compression_ratio─┬──────rows─┬─avg_row_size─┐ │ SpanAttributes │ 5.17 GiB │ 61.34 GiB │ 11.86 │ 409743258 │ 160.75 │ │ ResourceAttributes │ 626.61 MiB │ 20.43 GiB │ 33.39 │ 409743258 │ 53.55 │ │ TraceId │ 6.37 GiB │ 12.58 GiB │ 1.97 │ 409743258 │ 32.96 │ │ SpanId │ 3.33 GiB │ 6.48 GiB │ 1.94 │ 409743258 │ 16.98 │ │ Events.Timestamp │ 38.14 MiB │ 3.11 GiB │ 83.51 │ 409743258 │ 8.15 │ │ Links.TraceId │ 15.75 MiB │ 3.07 GiB │ 199.87 │ 409743258 │ 8.06 │ │ Duration │ 1.39 GiB │ 3.05 GiB │ 2.19 │ 409743258 │ 7.99 │ │ Timestamp │ 1.95 GiB │ 3.05 GiB │ 1.56 │ 409743258 │ 7.99 │ │ ParentSpanId │ 1.30 GiB │ 2.78 GiB │ 2.14 │ 409743258 │ 7.28 │ │ StatusMessage │ 707.93 KiB │ 392.67 MiB │ 567.98 │ 409743258 │ 1 │ │ SpanKind │ 1.06 MiB │ 391.77 MiB │ 370.24 │ 409743258 │ 1 │ │ StatusCode │ 4.72 MiB │ 391.77 MiB │ 82.95 │ 409743258 │ 1 │ │ SpanName │ 487.27 KiB │ 391.18 MiB │ 822.06 │ 409743258 │ 1 │ │ ServiceName │ 406.06 KiB │ 391.10 MiB │ 986.27 │ 409743258 │ 1 │ │ TraceState │ 274.75 KiB │ 390.31 MiB │ 1454.67 │ 409743258 │ 1 │ │ Events.Attributes │ 8.53 MiB │ 157.04 MiB │ 18.41 │ 409743258 │ 0.4 │ │ Links.SpanId │ 6.87 MiB │ 13.37 MiB │ 1.95 │ 409743258 │ 0.03 │ │ Events.Name │ 722.79 KiB │ 8.10 MiB │ 11.47 │ 409743258 │ 0.02 │ │ Links.Attributes │ 8.59 KiB │ 6.29 MiB │ 750.15 │ 409743258 │ 0.02 │ │ Links.TraceState │ 2.38 KiB │ 805.57 KiB │ 338.91 │ 409743258 │ 0 │ └────────────────────┴────────────┴──────────────┴───────────────────┴───────────┴──────────────┘
Half of the uncompressed size (61.34Gb out of 120Gb) is attributed to the SpanAttributes column. Let’s take a closer look at the specific span attributes. For a quick estimate of the distribution, let’s use the last hour’s data instead of the entire column.
SELECT arrayJoin(mapKeys(SpanAttributes)) AS key, formatReadableSize(sum(length(SpanAttributes[key]) + length(key)) AS size) AS uncompressed FROM otel_traces WHERE Timestamp > (now() - toIntervalHour(1)) GROUP BY key ORDER BY size DESC LIMIT 10 Query id: e56a538f-b36a-48c9-aff8-a9cff6828a3d ┌─key────────────────┬─uncompressed─┐ │ otel.scope.name │ 70.62 MiB │ │ http.url │ 43.27 MiB │ │ otel.scope.version │ 33.00 MiB │ │ net.peer.name │ 19.58 MiB │ │ http.status_code │ 18.63 MiB │ │ http.method │ 14.18 MiB │ │ db.statement │ 13.86 MiB │ │ net.peer.port │ 12.29 MiB │ │ http.user_agent │ 8.70 MiB │ │ net.sock.peer.addr │ 7.41 MiB │ └────────────────────┴──────────────┘
That’s odd. I was expecting attributes like http.url or http.user_agent to occupy most of the table space, but it turns out it’s OpenTelemetry’s auxiliary data. But why do we even need these attributes, and are they really valuable? Here’s what the OpenTelemetry documentation says:
OK, looks harmless to me. Let’s go ahead and calculate the total size of these attributes across the entire dataset:
SELECT formatReadableSize(sum( length(SpanAttributes['otel.scope.name']) + length('otel.scope.name') + length(SpanAttributes['otel.scope.version']) + length('otel.scope.version') ) AS size) AS uncompressed FROM otel_traces Query id: c5dfbd59-fe6f-458b-98a5-f523d90ca803 ┌─uncompressed─┐ │ 35.87 GiB │ └──────────────┘
SAY WHAT?!?! This makes up 30% of the total uncompressed table size. From my point of view, these attributes aren’t doing anyone any favors, and it’s a bit crazy to store them for EVERY tracing span and log record.
Now, you might say these attributes hardly ever change and don’t take up much space when compressed. That’s true! However, the catch is that most cloud observability platforms charge based on the amount of data you ingest, not what you store.
In simpler terms, you’re wasting about 30% of your tracing and logging budget on ingesting these useless attributes. Here’s another intriguing fact: observability vendors don’t incur any storage costs for these attributes, thanks to their perfect compression ratio. It’s quite a clever scheme, isn’t it ? 🙂
At Coroot, our main goal is to turn telemetry data into useful insights and help you troubleshoot. That’s why in our cloud we only charge for AI-based Root Cause Analysis, not for storage (or ingestion:)
Try Coroot Cloud now (14-day free trial is available) or follow the instructions on our Getting started page to install Coroot Community Edition.
If you like Coroot, give us a ⭐ on GitHub️.
Any questions or feedback? Reach out to us on Slack.