Skip to main content
    All postsEngineering

    You're overpaying for OpenTelemetry's verbosity by at least 30%

    Nikolay SivkoNikolay Sivko
    October 10, 20236 min read

    While working on Coroot’s Distributed Tracing capability, we set up an environment with numerous apps instrumented with OpenTelemetry to generate tracing data for testing purposes.

    Coroot uses the OpenTelemetry collector and its clickhouse-exporter to store traces and logs to ClickHouse. Recently, I explored the data compression ratio of tracing data in ClickHouse and stumbled upon something quite interesting. If I were into conspiracy theories, I might think it’s a plot by observability vendors:)

    Let’s start from the very beginning and look at the otel_traces table in ClickHouse:

    CREATE TABLE default.otel_traces
    (
        `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
        `TraceId` String CODEC(ZSTD(1)),
        `SpanId` String CODEC(ZSTD(1)),
        `ParentSpanId` String CODEC(ZSTD(1)),
        `TraceState` String CODEC(ZSTD(1)),
        `SpanName` LowCardinality(String) CODEC(ZSTD(1)),
        `SpanKind` LowCardinality(String) CODEC(ZSTD(1)),
        `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
        `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
        `SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
        `Duration` Int64 CODEC(ZSTD(1)),
        `StatusCode` LowCardinality(String) CODEC(ZSTD(1)),
        `StatusMessage` String CODEC(ZSTD(1)),
        `Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)),
        `Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)),
        `Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
        `Links.TraceId` Array(String) CODEC(ZSTD(1)),
        `Links.SpanId` Array(String) CODEC(ZSTD(1)),
        `Links.TraceState` Array(String) CODEC(ZSTD(1)),
        `Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
        INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
        INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
        INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
        INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
        INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
        INDEX idx_duration Duration TYPE minmax GRANULARITY 1
    )
    ENGINE = MergeTree
    PARTITION BY toDate(Timestamp)
    ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
    TTL toDateTime(Timestamp) + toIntervalDay(7)
    SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
    

    This table stores tracing spans of the last 7 days (TTL). Now, let’s look at its size and compression ratio:

    SELECT
        formatReadableSize(sum(data_compressed_bytes) AS csize) AS compressed,
        formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
        round(usize / csize, 1) AS compression_ratio,
        sum(rows) AS rows
    FROM system.parts
    WHERE table = 'otel_traces'
    Query id: 003a01c0-6c62-4284-8261-15b1fa5388c8
    ┌─compressed─┬─uncompressed─┬─compression_ratio──┬──────rows─┐
    │ 20.26 GiB  │ 118.71 GiB   │                5.9410405258 │
    └────────────┴──────────────┴────────────────────┴───────────┘
    

    5.9 is not bad, but let’s break down the ratio for each column individually:

    SELECT
        column,
        formatReadableSize(sum(column_data_compressed_bytes) AS csize) AS compressed,
        formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
        round(usize / csize, 2) AS compression_ratio,
        sum(rows) AS rows,
        round(usize / rows, 2) AS avg_row_size
    FROM system.parts_columns
    WHERE table = 'otel_traces'
    GROUP BY column
    ORDER BY usize DESC
    Query id: d0d1d09a-247b-4f57-9402-0a94e3f583fa
    ┌─column─────────────┬─compressed─┬─uncompressed─┬─compression_ratio─┬──────rows─┬─avg_row_size─┐
    │ SpanAttributes     │ 5.17 GiB   │ 61.34 GiB    │             11.86409743258160.75 │
    │ ResourceAttributes │ 626.61 MiB │ 20.43 GiB    │             33.3940974325853.55 │
    │ TraceId            │ 6.37 GiB   │ 12.58 GiB    │              1.9740974325832.96 │
    │ SpanId             │ 3.33 GiB   │ 6.48 GiB     │              1.9440974325816.98 │
    │ Events.Timestamp   │ 38.14 MiB  │ 3.11 GiB     │             83.514097432588.15 │
    │ Links.TraceId      │ 15.75 MiB  │ 3.07 GiB     │            199.874097432588.06 │
    │ Duration           │ 1.39 GiB   │ 3.05 GiB     │              2.194097432587.99 │
    │ Timestamp1.95 GiB   │ 3.05 GiB     │              1.564097432587.99 │
    │ ParentSpanId       │ 1.30 GiB   │ 2.78 GiB     │              2.144097432587.28 │
    │ StatusMessage      │ 707.93 KiB │ 392.67 MiB   │            567.984097432581 │
    │ SpanKind           │ 1.06 MiB   │ 391.77 MiB   │            370.244097432581 │
    │ StatusCode         │ 4.72 MiB   │ 391.77 MiB   │             82.954097432581 │
    │ SpanName           │ 487.27 KiB │ 391.18 MiB   │            822.064097432581 │
    │ ServiceName        │ 406.06 KiB │ 391.10 MiB   │            986.274097432581 │
    │ TraceState         │ 274.75 KiB │ 390.31 MiB   │           1454.674097432581 │
    │ Events.Attributes  │ 8.53 MiB   │ 157.04 MiB   │             18.414097432580.4 │
    │ Links.SpanId       │ 6.87 MiB   │ 13.37 MiB    │              1.954097432580.03 │
    │ Events.Name        │ 722.79 KiB │ 8.10 MiB     │             11.474097432580.02 │
    │ Links.Attributes   │ 8.59 KiB   │ 6.29 MiB     │            750.154097432580.02 │
    │ Links.TraceState   │ 2.38 KiB   │ 805.57 KiB   │            338.914097432580 │
    └────────────────────┴────────────┴──────────────┴───────────────────┴───────────┴──────────────┘
    

    Half of the uncompressed size (61.34Gb out of 120Gb) is attributed to the SpanAttributes column. Let’s take a closer look at the specific span attributes. For a quick estimate of the distribution, let’s use the last hour’s data instead of the entire column.

    SELECT
        arrayJoin(mapKeys(SpanAttributes)) AS key,
        formatReadableSize(sum(length(SpanAttributes[key]) + length(key)) AS size) AS uncompressed
    FROM otel_traces
    WHERE Timestamp > (now() - toIntervalHour(1))
    GROUP BY key
    ORDER BY size DESC
    LIMIT 10
    Query id: e56a538f-b36a-48c9-aff8-a9cff6828a3d
    ┌─key────────────────┬─uncompressed─┐
    │ otel.scope.name    │ 70.62 MiB    │
    │ http.url           │ 43.27 MiB    │
    │ otel.scope.version │ 33.00 MiB    │
    │ net.peer.name      │ 19.58 MiB    │
    │ http.status_code   │ 18.63 MiB    │
    │ http.method        │ 14.18 MiB    │
    │ db.statement       │ 13.86 MiB    │
    │ net.peer.port      │ 12.29 MiB    │
    │ http.user_agent    │ 8.70 MiB     │
    │ net.sock.peer.addr │ 7.41 MiB     │
    └────────────────────┴──────────────┘
    

    That’s odd. I was expecting attributes like http.url or http.user_agent to occupy most of the table space, but it turns out it’s OpenTelemetry’s auxiliary data. But why do we even need these attributes, and are they really valuable? Here’s what the OpenTelemetry documentation says:

    • otel.scope.name: this is the name of the instrumentation scope, like io.opentelemetry.contrib.mongodb.
    • otel.scope.version: it’s the version of the instrumentation scope, such as 1.0.0.

    OK, looks harmless to me. Let’s go ahead and calculate the total size of these attributes across the entire dataset:

    SELECT
        formatReadableSize(sum(
            length(SpanAttributes['otel.scope.name']) +
            length('otel.scope.name') +
            length(SpanAttributes['otel.scope.version']) +
            length('otel.scope.version')
        ) AS size) AS uncompressed
    FROM otel_traces
    Query id: c5dfbd59-fe6f-458b-98a5-f523d90ca803
    ┌─uncompressed─┐
    │ 35.87 GiB    │
    └──────────────┘
    

    SAY WHAT?!?! This makes up 30% of the total uncompressed table size. From my point of view, these attributes aren’t doing anyone any favors, and it’s a bit crazy to store them for EVERY tracing span and log record.

    Now, you might say these attributes hardly ever change and don’t take up much space when compressed. That’s true! However, the catch is that most cloud observability platforms charge based on the amount of data you ingest, not what you store.

    In simpler terms, you’re wasting about 30% of your tracing and logging budget on ingesting these useless attributes. Here’s another intriguing fact: observability vendors don’t incur any storage costs for these attributes, thanks to their perfect compression ratio. It’s quite a clever scheme, isn’t it ? 🙂

    You're overpaying for OpenTelemetry's verbosity by at least 30%

    At Coroot, our main goal is to turn telemetry data into useful insights and help you troubleshoot. That’s why in our cloud we only charge for AI-based Root Cause Analysis, not for storage (or ingestion:)

    Try Coroot Enterprise now (14-day free trial is available) or follow the instructions on our Getting started page to install Coroot Community Edition.

    If you like Coroot, give us a ⭐ on GitHub️.

    Any questions or feedback? Reach out to us on Slack.

    Try Coroot Free

    Get full-stack observability in minutes with zero code changes. eBPF-powered monitoring with AI-guided root cause analysis.