{"id":4178,"date":"2026-05-29T11:06:00","date_gmt":"2026-05-29T11:06:00","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/05\/29\/why-logs-metrics-and-traces-still-dont-give-you-real-observability\/"},"modified":"2026-05-29T11:06:00","modified_gmt":"2026-05-29T11:06:00","slug":"why-logs-metrics-and-traces-still-dont-give-you-real-observability","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/05\/29\/why-logs-metrics-and-traces-still-dont-give-you-real-observability\/","title":{"rendered":"Why Logs, Metrics and Traces Still Don\u2019t Give You Real Observability\u00a0"},"content":{"rendered":"<div><img data-opt-id=624961394  fetchpriority=\"high\" decoding=\"async\" width=\"767\" height=\"330\" src=\"https:\/\/devops.com\/wp-content\/uploads\/2022\/04\/codevulnerability.jpg\" class=\"attachment-large size-large wp-post-image\" alt=\"Jamf, Korea, code, hybrid, ai-powered, observability, insights, DevSecOps Cisco Chronosphere observability, data collection, Observe Google Splunk ServiceNow Logz.io observability Web3 developers CodeSee Survey Surfaces Slow But Steady DevSecOps Progress\" \/><\/div>\n<p><img data-opt-id=881563779  fetchpriority=\"high\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/devops.com\/wp-content\/uploads\/2022\/04\/codevulnerability-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"Jamf, Korea, code, hybrid, ai-powered, observability, insights, DevSecOps Cisco Chronosphere observability, data collection, Observe Google Splunk ServiceNow Logz.io observability Web3 developers CodeSee Survey Surfaces Slow But Steady DevSecOps Progress\" \/><\/p>\n<p><span data-contrast=\"auto\">Several years ago, the\u00a0observability community reached what felt like a consensus:\u00a0The\u00a0three pillars\u00a0\u2014\u00a0logs, metrics and traces. Instrument everything, ship it all to a central platform and you will finally understand what your system is doing.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">It\u2019s a tidy framework.\u00a0Yet it\u00a0turns out to be incomplete in ways that only become obvious once you\u2019re actually trying to debug a production incident with it.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This article isn\u2019t an argument against logs, metrics and traces; you\u00a0need all three.\u00a0However,\u00a0there\u2019s a growing set of failure modes in modern distributed systems that the three-pillar model struggles to explain \u2014 and understanding why is the first step toward building observability that actually works.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">The Promise of the Three Pillars<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Before we critique the model, let\u2019s be precise about what it promises.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><i><span data-contrast=\"auto\">Logs<\/span><\/i><span data-contrast=\"auto\">\u00a0give you a timestamped record of discrete events:\u00a0A\u00a0function was called, a request came in, an error was thrown. They\u2019re rich in detail and easy to add to code. The challenge is volume \u2014 a high-traffic service can generate millions of log lines per minute, and correlating across services requires discipline and tooling.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><i><span data-contrast=\"auto\">Metrics<\/span><\/i><span data-contrast=\"auto\">\u00a0give you aggregated numerical data over time:\u00a0Request\u00a0rate, error rate, latency percentiles, CPU usage. They\u2019re cheap to store, easy to alert on and ideal for dashboards. The tradeoff is that aggregation loses information \u2014 a p99 latency of\u00a0two\u00a0seconds tells you something is slow, but not where or why.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><i><span data-contrast=\"auto\">Traces<\/span><\/i><span data-contrast=\"auto\">\u00a0give you a causal record of how a single request moved through your system \u2014 which services it touched, how long each hop took, where errors occurred. Distributed tracing, <a href=\"https:\/\/devops.com\/why-your-ai-agent-is-a-black-box-and-how-to-fix-it-with-opentelemetry\/\" target=\"_blank\" rel=\"noopener\">using standards like<\/a><\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"none\">OpenTelemetry<\/span><span data-contrast=\"auto\">, has matured considerably and can dramatically accelerate root cause analysis.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Together, these three tools cover a lot of ground. For the canonical failure modes \u2014 a slow database query, a misconfigured cache, a crashing pod \u2014 they work well. The question is what happens when the failure mode is less canonical.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">What the Three Pillars Miss<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<h3><strong>The\u00a0Known Unknowns Problem\u00a0<\/strong><\/h3>\n<p><span data-contrast=\"auto\">Observability built on logs, metrics and traces is fundamentally a system of\u00a0<\/span><i><span data-contrast=\"auto\">known unknowns<\/span><\/i><span data-contrast=\"auto\">. You instrument the things you think might go wrong. You define the metrics that seem important. You add trace spans around the code paths you care about.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">However,\u00a0production systems fail in ways you didn\u2019t anticipate. When a new failure mode appears that doesn\u2019t match your existing instrumentation, you\u2019re blind. You\u2019re correlating signals that weren\u2019t designed to explain this kind of problem, often in a rush, in the middle of an incident.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The classic example:\u00a0A\u00a0subtle interaction between two microservices that each look perfectly healthy by their own metrics but are producing subtly wrong outputs when used together.\u00a0No individual metric captures this. The logs from each service look normal. The traces show normal latencies. The failure is in the\u00a0<\/span><i><span data-contrast=\"auto\">relationship<\/span><\/i><span data-contrast=\"auto\">\u00a0between services, not in any one service\u2019s behavior.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><strong>High-Cardinality Blind Spots\u00a0<\/strong><\/h3>\n<p><span data-contrast=\"auto\">Traditional metrics systems have trouble with high-cardinality data. If you want to track latency by user ID or error rate by specific combination of API endpoint + region + tenant, most time-series databases either refuse or make it prohibitively expensive.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This is a real operational gap. Many production incidents are caused by a specific cohort of users, a specific region or a specific combination of request parameters that you can\u2019t efficiently query for with low-cardinality metrics.\u00a0You know something is wrong. You can see the aggregate signal degrading. But you can\u2019t slice to the exact population affected.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Tools\u00a0such as<\/span><a href=\"https:\/\/www.honeycomb.io\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Honeycomb<\/span><span data-contrast=\"auto\">\u00a0and<\/span><a href=\"https:\/\/lightstep.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Lightstep<\/span><span data-contrast=\"auto\">\u00a0were built specifically around solving this with high-cardinality event data \u2014 each request as a structured event with arbitrary fields. This is a fundamentally different mental model than metrics-first observability, and it enables queries you simply cannot run with traditional tools.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"none\">The Context Collapse Problem<\/span><span data-ccp-props='{\"134245418\":false,\"134245529\":false,\"335559738\":280,\"335559739\":80}'>\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Traces give you a picture of how a request flows through your system.\u00a0However,\u00a0that picture is only as good as the context propagated through it. In practice, context collapse is endemic:<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<ul>\n<li data-leveltext=\"\u25cf\" data-font=\"\" data-listid=\"1\" data-list-defn-props='{\"335552541\":1,\"335559685\":720,\"335559991\":360,\"469769242\":[8226],\"469777803\":\"left\",\"469777804\":\"\u25cf\",\"469777815\":\"multilevel\"}' data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">Asynchronous message queues break trace continuity unless you explicitly propagate trace IDs in message payloads.<\/span><span data-ccp-props='{\"335559738\":240}'>\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u25cf\" data-font=\"\" data-listid=\"1\" data-list-defn-props='{\"335552541\":1,\"335559685\":720,\"335559991\":360,\"469769242\":[8226],\"469777803\":\"left\",\"469777804\":\"\u25cf\",\"469777815\":\"multilevel\"}' data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">Third-party services you call don\u2019t participate in your tracing infrastructure.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u25cf\" data-font=\"\" data-listid=\"1\" data-list-defn-props='{\"335552541\":1,\"335559685\":720,\"335559991\":360,\"469769242\":[8226],\"469777803\":\"left\",\"469777804\":\"\u25cf\",\"469777815\":\"multilevel\"}' data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">Batch jobs and background workers often lack the proper instrumentation to connect to the traces that triggered them.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u25cf\" data-font=\"\" data-listid=\"1\" data-list-defn-props='{\"335552541\":1,\"335559685\":720,\"335559991\":360,\"469769242\":[8226],\"469777803\":\"left\",\"469777804\":\"\u25cf\",\"469777815\":\"multilevel\"}' data-aria-posinset=\"4\" data-aria-level=\"1\"><span data-contrast=\"auto\">Lambda functions and serverless runtimes have historically\u00a0had spotty tracing support.<\/span><span data-ccp-props='{\"335559739\":240}'>\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">The result is that your traces often have gaps right where you need them most. You can see the request arrive and you can see the response go out, but what happened in between \u2014 especially anything involving async work \u2014 is a black box.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><strong>Business Logic Blind Spots\u00a0<\/strong><\/h3>\n<p><span data-contrast=\"auto\">Perhaps the deepest limitation of the three-pillar model is that it\u2019s primarily an\u00a0<\/span><i><span data-contrast=\"auto\">infrastructure<\/span><\/i><span data-contrast=\"auto\">\u00a0observability model.\u00a0It tells you about the technical behavior of your system:\u00a0Latencies, error rates, resource consumption.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">What it doesn\u2019t tell you is whether your system is doing the\u00a0<\/span><i><span data-contrast=\"auto\">right thing<\/span><\/i><span data-contrast=\"auto\">\u00a0from a business perspective. A service can have perfect error rates and excellent latency while returning subtly wrong results \u2014 prices calculated incorrectly, recommendations served to the wrong users, inventory counts that don\u2019t match reality.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This is the gap that the emerging concept of\u00a0<\/span><i><span data-contrast=\"auto\">semantic observability<\/span><\/i><span data-contrast=\"auto\">\u00a0is trying to fill:\u00a0Instrumenting\u00a0the business outcomes, not just the technical behavior, so you can detect\u00a0<\/span><i><span data-contrast=\"auto\">this is producing wrong results<\/span><\/i><span data-contrast=\"auto\">\u00a0rather than just\u00a0<\/span><i><span data-contrast=\"auto\">this is slow or erroring<\/span><\/i><span data-contrast=\"auto\">.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">What Real Observability Looks Like<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Moving beyond the three-pillar model doesn\u2019t mean abandoning it. It means being honest about its limits and layering additional practices on top.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"none\">Start\u00a0With\u00a0Structured Events<\/span><span data-ccp-props='{\"134245418\":false,\"134245529\":false,\"335559738\":280,\"335559739\":80}'>\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Wherever possible, emit structured events \u2014 JSON objects with meaningful fields \u2014 rather than unstructured log strings. This is the difference between:<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">[ERROR] Failed to process order 12345<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">and:<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">{<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201clevel\u201d: \u201cerror\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cevent\u201d: \u201corder_processing_failed\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201corder_id\u201d: \u201c12345\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cuser_id\u201d: \u201cu-789\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cregion\u201d: \u201cus-east-1\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cpayment_method\u201d: \u201cstripe\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cerror_code\u201d: \u201cSTRIPE_TIMEOUT\u201d,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201cduration_ms\u201d: 4823,<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 \u201ctrace_id\u201d: \u201cabc-def-ghi\u201d<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">}<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The second version is\u00a0queryable\u00a0in ways the first is not. You can ask\u00a0\u201chow many Stripe timeouts happened in us-east-1 in the last 10 minutes, broken down by payment method?\u201d\u00a0You cannot meaningfully answer that question from unstructured log strings at scale.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">OpenTelemetry<\/span><span data-contrast=\"auto\">\u00a0is the obvious standard to adopt here \u2014 it gives you a vendor-neutral way to emit logs, metrics and traces in a consistent format that works with tools from<\/span><a href=\"https:\/\/grafana.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Grafana<\/span><span data-contrast=\"auto\">,<\/span><a href=\"https:\/\/www.datadoghq.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Datadog<\/span><span data-contrast=\"auto\">, Honeycomb,<\/span><a href=\"https:\/\/www.jaegertracing.io\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Jaeger<\/span><span data-contrast=\"auto\">\u00a0and many others.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"none\">Embrace High-Cardinality Querying<\/span><span data-ccp-props='{\"134245418\":false,\"134245529\":false,\"335559738\":280,\"335559739\":80}'>\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">If your current observability stack can\u2019t answer\u00a0\u201cshow me p95 latency broken down by user plan tier and API endpoint for the last\u00a0five\u00a0minutes,\u201d\u00a0you have a meaningful blind spot. Evaluate whether your tooling supports this.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">If it doesn\u2019t, it\u2019s worth looking seriously at event-based observability platforms \u2014 or at minimum, adding a tool like<\/span><a href=\"https:\/\/clickhouse.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">ClickHouse<\/span><span data-contrast=\"auto\">\u00a0as a backend for high-cardinality queries on your structured event data.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Build Continuous Verification\u00a0Into\u00a0Your Pipelines<\/span><span data-ccp-props='{\"134245418\":false,\"134245529\":false,\"335559738\":280,\"335559739\":80}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">One of the most underused observability practices is continuous verification:\u00a0Running\u00a0lightweight synthetic checks in production that validate business correctness, not just technical health. Can a user complete checkout end\u00a0to\u00a0end? Is the price returned for product X correct? Is the recommendation engine returning results that match our expected logic?<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Tools\u00a0such as<\/span><a href=\"https:\/\/steadybit.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Steadybit<\/span><span data-contrast=\"auto\">,<\/span><a href=\"https:\/\/www.gremlin.com\/\"><span data-contrast=\"auto\">\u00a0<\/span><\/a><span data-contrast=\"none\">Gremlin<\/span><span data-contrast=\"auto\">\u00a0and even custom health checks can serve this purpose. The goal is to detect\u00a0<\/span><i><span data-contrast=\"auto\">wrong results<\/span><\/i><span data-contrast=\"auto\">\u00a0before your users do.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"none\">Instrument for the Incident\u00a0you\u00a0Haven\u2019t\u00a0had Yet<\/span><span data-ccp-props='{\"134245418\":false,\"134245529\":false,\"335559738\":280,\"335559739\":80}'>\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">After every production incident, do a deliberate review:\u00a0What\u00a0information would have made this faster to diagnose? Then add that instrumentation. This is a mundane practice with large compounding returns. Teams that do it consistently become visibly better at debugging over 12\u201318 months.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The Tooling Landscape<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">A few tools worth knowing about beyond the standard trio:<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<table data-tablestyle=\"Custom\" data-tablelook=\"0\">\n<tbody>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Tool<\/span><span data-ccp-props='{\"335551550\":2,\"335551620\":2}'>\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">What it\u2019s\u00a0Good For<\/span><span data-ccp-props='{\"335551550\":2,\"335551620\":2}'>\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"none\">Honeycomb<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">High-cardinality event exploration, fast arbitrary slicing<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"none\">Grafana Tempo<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Scalable distributed tracing, integrates well with Loki + Prometheus<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><a href=\"https:\/\/opentelemetry.io\/docs\/collector\/\" target=\"_blank\" rel=\"noopener\"><span data-contrast=\"none\">OpenTelemetry\u00a0Collector<\/span><\/a><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Vendor-neutral telemetry pipeline<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"none\">Jaeger<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Open-source distributed tracing<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"none\">ClickHouse<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Fast analytical queries on high-volume structured events<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr>\n<td data-celllook=\"4369\"><span data-contrast=\"none\">Robusta<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<td data-celllook=\"4369\"><span data-contrast=\"auto\">Kubernetes-native alert enrichment and runbooks<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span data-contrast=\"auto\">None of these replace logs, metrics and traces. All of them extend what\u2019s possible with them.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Conclusion<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">The three-pillar model is a good starting point, not an ending point. It\u2019s a framework developed at a time when microservice architectures were simpler, cardinality was less of a concern and business logic observability wasn\u2019t part of the conversation.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Modern distributed systems are messier, more dynamic and more business-critical. The observability practices that serve them well need to be messier too \u2014 high-cardinality, event-first, contextually rich and explicitly coupled to business outcomes.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">If your team can answer the question\u00a0\u201cDid the system do the right thing?\u201d\u00a0and not just\u00a0\u201cDid the system stay up?\u201d, you\u2019re getting close to real observability. If you can only answer the second question, there\u2019s work left to do.<\/span><span data-ccp-props='{\"335559738\":240,\"335559739\":240}'>\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/devops.com\/why-logs-metrics-and-traces-still-dont-give-you-real-observability\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\">Read More<\/a><\/p>\n<p>\u200b<\/p>","protected":false},"excerpt":{"rendered":"<p>Several years ago, the\u00a0observability community reached what felt like a consensus:\u00a0The\u00a0three pillars\u00a0\u2014\u00a0logs, metrics and traces. Instrument everything, ship it all [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4179,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[5],"tags":[],"class_list":["post-4178","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/4178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=4178"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/4178\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/4179"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=4178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=4178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=4178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}