{"id":2506,"date":"2025-09-19T12:16:39","date_gmt":"2025-09-19T12:16:39","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/09\/19\/beyond-containers-llama-cpp-now-pulls-gguf-models-directly-from-docker-hub\/"},"modified":"2025-09-19T12:16:39","modified_gmt":"2025-09-19T12:16:39","slug":"beyond-containers-llama-cpp-now-pulls-gguf-models-directly-from-docker-hub","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/09\/19\/beyond-containers-llama-cpp-now-pulls-gguf-models-directly-from-docker-hub\/","title":{"rendered":"Beyond Containers: llama.cpp Now Pulls GGUF Models Directly from Docker Hub"},"content":{"rendered":"<p>The world of local AI is moving at an incredible pace, and at the heart of this revolution is llama.cpp\u2014the powerhouse C++ inference engine that brings Large Language Models (LLMs) to everyday hardware (and it\u2019s also the inference engine that powers <a href=\"https:\/\/docs.docker.com\/ai\/model-runner\/\" target=\"_blank\">Docker Model Runner<\/a>). Developers love llama.cpp for its performance and simplicity. And we at Docker are obsessed with making developer workflows simpler.<\/p>\n<p>That\u2019s why we\u2019re thrilled to announce a game-changing new feature in llama.cpp: native support for pulling and running GGUF models directly from Docker Hub.<\/p>\n<p>This isn\u2019t about running llama.cpp <em>in<\/em> a Docker container. This is about using Docker Hub as a powerful, versioned, and centralized repository for your AI models, just like you do for your container images.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Why Docker Hub for AI Models?<\/strong><\/h2>\n<p>Managing AI models can be cumbersome. You\u2019re often dealing with direct download links, manual version tracking, and scattered files. By integrating with Docker Hub, llama.cpp leverages a mature and robust ecosystem to solve these problems.<\/p>\n<p><strong>Rock-Solid Versioning<\/strong>: The familiar repository:tag syntax you use for images now applies to models. Easily switch between gemma3 and smollm2:135M-Q4_0 with complete confidence.<\/p>\n<p><strong>Centralized &amp; Discoverable<\/strong>: Docker Hub can become the canonical source for your team\u2019s models. No more hunting for the \u201clatest\u201d version on a shared drive or in a chat history.<\/p>\n<p><strong>Simplified Workflow<\/strong>: Forget curl, wget or manually downloading from web UIs. A single command-line flag now handles discovery, download, and caching.<\/p>\n<p><strong>Reproducibility<\/strong>: By referencing a model with its immutable digest or tag, you ensure that your development, testing, and production environments are all using the exact same artifact, leading to more consistent and reproducible results.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How It Works Under the Hood\u00a0<\/strong><\/h2>\n<p>This new feature cleverly uses the Open Container Initiative (OCI) specification, which is the foundation of Docker images. The GGUF model file is treated as a layer within an OCI manifest, identified by a special media type like application\/vnd.docker.ai.gguf.v3. For more details on why the OCI standard matters for models, check out our <a href=\"https:\/\/www.docker.com\/blog\/oci-artifacts-for-ai-model-packaging\/\">blog<\/a>.<\/p>\n<p>When you use the new \u2013docker-repo flag, llama.cpp performs the following steps:<\/p>\n<p><strong>Authentication<\/strong>: It first requests an authentication token from the Docker registry to authorize the download.<\/p>\n<p><strong>Manifest Fetch<\/strong>: It then fetches the manifest for the specified model and tag (e.g., ai\/gemma3:latest).<\/p>\n<p><strong>Layer Discovery<\/strong>: It parses the manifest to find the specific layer that contains the GGUF model file by looking for the correct media type.<\/p>\n<p><strong>Blob Download<\/strong>: Using the layer\u2019s unique digest (a sha256 hash), it downloads the model file directly from the registry\u2019s blob storage.<\/p>\n<p><strong>Caching<\/strong>: The model is saved to a local cache, so subsequent runs are instantaneous.<\/p>\n<p>This entire process is seamless and happens automatically in the background.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Get Started in Seconds<\/strong><\/h2>\n<p>Ready to try it? If you have a <a href=\"https:\/\/github.com\/ggml-org\/llama.cpp\/pull\/15790\" target=\"_blank\">recent build<\/a> of llama.cpp, you can serve a model from Docker Hub with one simple command. The new flag is \u2013docker-repo (or -dr).<\/p>\n<p>Let\u2019s run <a href=\"https:\/\/hub.docker.com\/r\/ai\/gemma3\" target=\"_blank\">gemma3<\/a>, a model available from Docker Hub.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n# Now, serve a model from Docker Hub!<br \/>\nllama-server -dr gemma3\n<\/div>\n<p>The first time you execute this, you\u2019ll see llama.cpp log the download progress. After that, it will use the cached version. It\u2019s that easy! The default organization is ai\/, so gemma3 is resolved to ai\/gemma3. The default tag is :latest, but a tag can be specified like :1B-Q4_K_M.<\/p>\n<p>For a complete Docker-integrated experience with OCI pushing and pulling support try out Docker Model Runner. The docker model runner equivalent for chatting is:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n# Pull, serve and chat to a model from Docker Hub!<br \/>\ndocker model run ai\/gemma3\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>The Future of AI Model Distribution<\/strong><\/h2>\n<p>This integration represents a powerful shift in how we think about distributing and managing AI artifacts. By using OCI-compliant registries like Docker Hub, the AI community can build more robust, reproducible, and scalable MLOps pipelines.<\/p>\n<p>This is just the beginning. We envision a future where models, datasets, and the code that runs them are all managed through the same streamlined, developer-friendly workflow that has made Docker an essential tool for millions.<\/p>\n<p>Check out the latest llama.cpp to try it out, and explore the growing collection of models on <a href=\"https:\/\/hub.docker.com\/u\/ai\" target=\"_blank\">Docker Hub<\/a> today!<\/p>\n<h3 class=\"wp-block-heading\">Learn more<\/h3>\n<p>Read our quickstart guide to<a href=\"https:\/\/www.docker.com\/blog\/run-llms-locally\/\"> Docker Model Runner<\/a>.<\/p>\n<p>Visit our<a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\"> Model Runner GitHub repo<\/a>! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!<\/p>\n<p>Discover curated models on <a href=\"https:\/\/hub.docker.com\/u\/ai\" target=\"_blank\">Docker Hub<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>The world of local AI is moving at an incredible pace, and at the heart of this revolution is llama.cpp\u2014the [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2506","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2506","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2506"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2506\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}