{"id":3740,"date":"2026-03-31T03:17:10","date_gmt":"2026-03-31T03:17:10","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/03\/31\/iceberg-won-the-format-war-now-comes-the-hard-part\/"},"modified":"2026-03-31T03:17:10","modified_gmt":"2026-03-31T03:17:10","slug":"iceberg-won-the-format-war-now-comes-the-hard-part","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/03\/31\/iceberg-won-the-format-war-now-comes-the-hard-part\/","title":{"rendered":"Iceberg Won the Format War \u2014 Now Comes the Hard Part"},"content":{"rendered":"<div><img data-opt-id=266353389  fetchpriority=\"high\" decoding=\"async\" width=\"770\" height=\"330\" src=\"https:\/\/devops.com\/wp-content\/uploads\/2026\/03\/770-330-2026-03-30T225538.595.png\" class=\"attachment-large size-large wp-post-image\" alt=\"\" \/><\/div>\n<p><img data-opt-id=450371075  fetchpriority=\"high\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/devops.com\/wp-content\/uploads\/2026\/03\/770-330-2026-03-30T225538.595-150x150.png\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" \/><\/p>\n<p class=\"p1\">Apache Iceberg has effectively won the open table format conversation. AWS, Google Cloud, Microsoft, Snowflake, Databricks \u2014 every major platform has thrown its weight behind it. If you work in data engineering or platform operations, the question is no longer whether Iceberg is the right foundation. It\u2019s what it actually takes to run it day to day.<\/p>\n<p class=\"p2\">That second question doesn\u2019t get nearly enough airtime. And it\u2019s the one that determines whether your Iceberg adoption goes well or becomes a slow-motion infrastructure project that nobody budgeted for.<\/p>\n<h3 class=\"p3\"><b>The Gap Nobody Talks About<\/b><\/h3>\n<p class=\"p1\">Here\u2019s what Iceberg gives you: a table format with schema evolution, time travel, partition evolution, and engine independence. Here\u2019s what Iceberg does not give you: a way to get data into those tables, a way to model and transform it once it\u2019s there, a way to coordinate when things run, or a way to keep table health in check as data piles up.<\/p>\n<p class=\"p1\">Put differently, Iceberg defines how tables behave, not how to operate the pipelines around them.<\/p>\n<p class=\"p2\">Most teams discover this the hard way. They pick Iceberg for its openness and flexibility, then spend months wiring together ingestion tools, dbt jobs, schedulers, and homegrown maintenance scripts. The individual pieces work. The thing they form is fragile because reliability lives in the gaps between tools rather than in any single place you can point to and say, \u201cThis is responsible.\u201d<\/p>\n<h3 class=\"p3\"><b>Why This Is a DevOps Problem<\/b><\/h3>\n<p class=\"p1\">If you\u2019ve been in DevOps for any length of time, this should ring a bell. It\u2019s the same mess software delivery was in before CI\/CD grew up: too many disconnected steps, no single system of record, and failures that only show up at the seams between tools. Data pipelines have just been slower to hit this wall.<\/p>\n<p class=\"p1\">Here\u2019s a scenario I\u2019ve watched play out multiple times. A schema change is applied to a production database. The ingestion tool picks it up and starts writing new fields into Iceberg. A dbt Core job runs later on its hourly schedule. It either blows up because its assumptions about the schema are now wrong, or \u2014 worse \u2014 it succeeds while quietly producing partial results that nobody catches until a dashboard goes sideways downstream. Meanwhile, table maintenance (compaction, snapshot expiry, orphan file cleanup) is running on its own cadence, completely unaware of what ingestion or transformation just did.<\/p>\n<p class=\"p1\">When you debug this, you\u2019re context-switching across four systems, none of which are broken on their own. The problem is coordination. And the coordination layer? That\u2019s you, at 3 a.m., reading runbooks in a Slack thread.<\/p>\n<p class=\"p2\">DevOps figured out years ago that humans make terrible glue between systems. That\u2019s what drove the shift from manual deployments to automated pipelines with feedback loops, observability, and rollback. Data engineering is at an inflection point, and Iceberg adoption is accelerating it.<\/p>\n<h3 class=\"p3\"><b>Schedules Are Holding Your Iceberg Stack Together with Duct Tape<\/b><\/h3>\n<p class=\"p1\">The deeper architectural issue is how most Iceberg stacks coordinate work. Almost everyone uses time-based schedules: ingest every five minutes, run dbt hourly, and compact nightly. Fine when everything is batch and nothing changes. Not fine when ingestion is continuous, schema changes are routine, and table operations have a real impact on query performance.<\/p>\n<p class=\"p1\">Iceberg\u2019s own snapshot mechanism gives us a better primitive. Snapshots capture the exact state of a table at a point in time. A system built around table state can answer questions that schedules can\u2019t: this snapshot invalidated these downstream models; these tables now need compaction; downstream consumers can read a consistent version once these steps finish.<\/p>\n<p class=\"p2\">If you\u2019ve worked with event-driven architectures in application development, this should feel natural \u2014 the state change itself triggers the next action instead of a cron job polling for it. For data platforms, this means the scheduler becomes an implementation detail rather than the load-bearing wall. It also means lineage gets real: not just \u201cthis source feeds this table\u201d but \u201cthis version of the source produced this version of the model,\u201d traceable to the snapshot.<\/p>\n<h3 class=\"p3\"><b>You Traded Vendor Lock-In For Internal Platform Lock-In<\/b><\/h3>\n<p class=\"p1\">There\u2019s an irony that keeps coming up in my conversations with data platform teams. They chose Iceberg to avoid vendor lock-in. Then they spent six months building a bespoke pipeline platform \u2014 custom orchestration, monitoring scripts for table health, runbooks that span four tools, tribal knowledge about which manual steps to run when things break \u2014 and now they\u2019re locked into that instead. Different lock-in, same result: you can\u2019t easily change course because too much depends on the specific way you wired it all together.<\/p>\n<p class=\"p1\">This compounds fast. A handful of Iceberg tables can be managed with scripts and good intentions. A hundred tables with interdependencies need a system. Most teams hit this realization somewhere around table thirty, when operating pipelines starts eating more time than building them.<\/p>\n<p class=\"p2\">Anyone who\u2019s been around long enough remembers when companies managed deployments with shell scripts and SSH. It worked until it didn\u2019t. The shift to declarative infrastructure and managed delivery pipelines didn\u2019t happen because the scripts stopped functioning. It happened because the cost of scaling that approach grew faster than the team. We\u2019re at the same crossroads with data pipelines on Iceberg.<\/p>\n<h3 class=\"p3\"><b>What to Look for in an Iceberg Pipeline Layer<\/b><\/h3>\n<p class=\"p1\">If you\u2019re evaluating your Iceberg strategy or about to start one, here\u2019s how I\u2019d think about the pipeline layer you\u2019ll need.<\/p>\n<p class=\"p1\">Ingestion and transformation should live in the same system. When they\u2019re separate, schema evolution and data quality become coordination headaches instead of pipeline features. Data quality tests shouldn\u2019t be downstream assertions that catch problems after you\u2019ve already burned the compute. They should be contracts at the boundary where data enters.<\/p>\n<p class=\"p1\">Table operations\u2014compaction, snapshot expiry, metadata cleanup\u2014need to be treated as first-class concerns, not cron jobs you set up once and forget about. They directly affect query performance and storage costs, and they need awareness of what the pipeline is doing. Running compaction during a large ingestion batch is a great way to create problems that are hard to diagnose.<\/p>\n<p class=\"p1\">The system should run inside your VPC. If you picked Iceberg for data sovereignty and security, sending your data through someone else\u2019s infrastructure undermines the whole point. This isn\u2019t hypothetical \u2014 in financial services and healthcare, regulations and company policies often mandate that data never leaves the VPC.<\/p>\n<p class=\"p2\">And the pipeline layer should let you build once and serve many consumers from the same Iceberg tables: analytics, data science, AI workloads, and data sharing. Iceberg makes this architecturally possible. The pipeline layer is what makes it operationally real.<\/p>\n<h3 class=\"p3\"><b>Where These Conversations Are Happening<\/b><\/h3>\n<p class=\"p1\">What I like about the Iceberg ecosystem is that these operational problems are being discussed in the open. The format is open source, governance is through the Apache Software Foundation, and the community skews heavily toward practitioners who are running this stuff for real.<\/p>\n<p class=\"p2\">If any of this resonates, <a href=\"https:\/\/www.icebergsummit.org\/?utm_medium=sponsor&amp;utm_source=etleap&amp;utm_content=social\"><span class=\"s1\">Iceberg Summit 2026<\/span><\/a> is worth your time. It\u2019s April 8-9 in San Francisco (Marriott Marquis), run under the Apache Software Foundation, and it\u2019s the one event where you\u2019ll find core maintainers, production users, and platform architects all in the same room. Last year\u2019s edition had serious depth \u2014 not vendor keynotes, but real case studies and technical deep dives. I expect this year to go even deeper on the operational side, which is where the hard problems are right now.<\/p>\n<p class=\"p3\"><b>The Hard Part Starts Now<\/b><\/p>\n<p class=\"p1\">Iceberg has crossed the adoption threshold. The format debate is over. What hasn\u2019t been settled is how teams will actually run it without drowning in operational overhead.<\/p>\n<p class=\"p1\">If you\u2019ve spent time in DevOps, you know the tools are only as good as the system they form. A great CI server doesn\u2019t help much if your deployment process is held together with hope and shell scripts. The same is true for data pipelines on Iceberg. The question for 2026 isn\u2019t whether to build on Iceberg. It\u2019s whether your pipeline architecture can keep up.<\/p>\n<p><a href=\"https:\/\/devops.com\/iceberg-won-the-format-war-now-comes-the-hard-part\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\">Read More<\/a><\/p>\n<p>\u200b<\/p>","protected":false},"excerpt":{"rendered":"<p>Apache Iceberg has effectively won the open table format conversation. AWS, Google Cloud, Microsoft, Snowflake, Databricks \u2014 every major platform [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3741,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[5],"tags":[],"class_list":["post-3740","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3740","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=3740"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3740\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/3741"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=3740"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=3740"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=3740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}