{"id":2122,"date":"2025-06-11T12:38:46","date_gmt":"2025-06-11T12:38:46","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/06\/11\/how-to-build-run-and-package-ai-models-locally-with-docker-model-runner\/"},"modified":"2025-06-11T12:38:46","modified_gmt":"2025-06-11T12:38:46","slug":"how-to-build-run-and-package-ai-models-locally-with-docker-model-runner","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/06\/11\/how-to-build-run-and-package-ai-models-locally-with-docker-model-runner\/","title":{"rendered":"How to Build, Run, and Package AI Models Locally with Docker Model Runner"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Introduction<\/h2>\n<p>As a Senior DevOps Engineer and Docker Captain, I\u2019ve helped build AI systems for everything from retail personalization to medical imaging. One truth stands out: AI capabilities are core to modern infrastructure.<\/p>\n<p>This guide will show you how to run and package local AI models with <a href=\"https:\/\/www.docker.com\/blog\/introducing-docker-model-runner\/\">Docker Model Runner<\/a> \u2014 a lightweight, developer-friendly tool for working with AI models pulled from Docker Hub or Hugging Face. You\u2019ll learn how to run models in the CLI or via API, publish your own model artifacts, and do it all without setting up Python environments or web servers.<\/p>\n<h2 class=\"wp-block-heading\">What is AI in Development?<\/h2>\n<p>Artificial Intelligence (AI) refers to systems that mimic human intelligence, including:<\/p>\n<p>Making decisions via machine learning<\/p>\n<p>Understanding language through NLP<\/p>\n<p>Recognizing images with computer vision<\/p>\n<p>Learning from new data automatically<\/p>\n<h3 class=\"wp-block-heading\">Common Types of AI in Development:<\/h3>\n<p><strong>Machine Learning (ML)<\/strong>: Learns from structured and unstructured data<\/p>\n<p><strong>Deep Learning<\/strong>: Neural networks for pattern recognition<\/p>\n<p><strong>Natural Language Processing (NLP)<\/strong>: Understands\/generates human language<\/p>\n<p><strong>Computer Vision<\/strong>: Recognizes and interprets images<\/p>\n<h2 class=\"wp-block-heading\">Why Package and Run Your Own AI Model?<\/h2>\n<p>Local model packaging and execution offer full control over your AI workflows. Instead of relying on external APIs, you can run models directly on your machine \u2014 unlocking:<\/p>\n<p>Faster inference with local compute (no latency from API calls)<\/p>\n<p>Greater privacy by keeping data and prompts on your own hardware<\/p>\n<p>Customization through packaging and versioning your own models<\/p>\n<p>Seamless CI\/CD integration with tools like Docker and GitHub Actions<\/p>\n<p>Offline capabilities for edge use cases or constrained environments<\/p>\n<p>Platforms like Docker and Hugging Face make cutting-edge AI models instantly accessible without building from scratch. Running them locally means lower latency, better privacy, and faster iteration.<\/p>\n<h2 class=\"wp-block-heading\">Real-World Use Cases for AI<\/h2>\n<p><strong>Chatbots &amp; Virtual Assistants<\/strong>: Automate support (e.g., ChatGPT, Alexa)<\/p>\n<p><strong>Generative AI<\/strong>: Create text, art, music (e.g., Midjourney, Lensa)<\/p>\n<p><strong>Dev Tools<\/strong>: Autocomplete and debug code (e.g., GitHub Copilot)<\/p>\n<p><strong>Retail Intelligence<\/strong>: Recommend products based on behavior<\/p>\n<p><strong>Medical Imaging<\/strong>: Analyze scans for faster diagnosis<\/p>\n<h2 class=\"wp-block-heading\">How to Package and Run AI Models Locally with Docker Model Runner<\/h2>\n<h3 class=\"wp-block-heading\">Prerequisites:<\/h3>\n<p><a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\">Docker Desktop<\/a> 4.40+ installed<\/p>\n<p>Experimental features and <a href=\"https:\/\/docs.docker.com\/model-runner\/#enable-docker-model-runner\" target=\"_blank\">Model Runner enabled<\/a> in Docker Desktop settings<\/p>\n<p>(Recommended) Windows 11 with NVIDIA GPU or Mac with Apple Silicon<\/p>\n<p>Internet access for <a href=\"https:\/\/hub.docker.com\/catalogs\/models\" target=\"_blank\">downloading models from Docker Hub<\/a> or Hugging Face<\/p>\n<h3 class=\"wp-block-heading\">Step 0 \u2014 Enable Docker Model Runner<\/h3>\n<p>Open Docker Desktop<\/p>\n<p>Go to Settings \u2192 Features in development<\/p>\n<p>Under the Experimental features tab, enable Access experimental features<\/p>\n<p>Click Apply and restart<\/p>\n<p>Quit and reopen Docker Desktop to ensure changes take effect<\/p>\n<p>Reopen Settings \u2192 Features in development<\/p>\n<p>Switch to the Beta tab and check <strong>Enable Docker Model Runner<\/strong><\/p>\n<p><em>(Optional)<\/em> Enable host-side TCP support to access the API from localhost<\/p>\n<p><strong>Once enabled, you can use the docker model CLI and manage models in the Models tab.<\/strong><\/p>\n<div class=\"wp-block-ponyo-image\">\n<\/div>\n<p class=\"has-small-font-size\"><em>Screenshot of Docker Desktop\u2019s Features in development tab with Docker Model Runner and Dev Environments enabled.<\/em><\/p>\n\n<h3 class=\"wp-block-heading\">Step 1: Pull a Model<\/h3>\n<p>From Docker Hub:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model pull ai\/smollm2\n<\/div>\n<p>Or from Hugging Face (GGUF format):<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model pull hf.co\/bartowski\/Llama-3.2-1B-Instruct-GGUF\n<\/div>\n<p><strong>Note:<\/strong> Only <strong>GGUF models<\/strong> are supported. GGUF (GPT-style General Use Format) is a lightweight binary file format designed for efficient local inference, especially with CPU-optimized runtimes like llama.cpp. It includes the model weights, tokenizer, and metadata all in one place, making it ideal for packaging and distributing LLMs in containerized environments.<\/p>\n<h3 class=\"wp-block-heading\">Step 2: Tag and Push to Local Registry (Optional)<\/h3>\n<p>If you want to push models to a private or local registry:<\/p>\n<p>Tag model with your registry\u2019s address:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model tag hf.co\/bartowski\/Llama-3.2-1B-Instruct-GGUF localhost:5000\/foobar\n<\/div>\n\n<p>Run a local Docker registry:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker run -d -p 6000:5000 &#8211;name registry registry:2\n<\/div>\n\n<p>Push the model to the local registry:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model push localhost:6000\/foobar\n<\/div>\n\n<p>Check your local models with:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model list\n<\/div>\n\n<h3 class=\"wp-block-heading\">Step 3: Run the Model<\/h3>\n<p>Run a prompt (one-shot)<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model run ai\/smollm2 &#8220;What is Docker?&#8221;\n<\/div>\n\n<p>Interactive chat mode<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model run ai\/smollm2\n<\/div>\n\n<p><strong>Note: <\/strong>Models are loaded into memory on demand and unloaded after 5 minutes of inactivity.<\/p>\n<h3 class=\"wp-block-heading\">Step 4: Test via OpenAI-Compatible API<\/h3>\n<p>To call the model from the host:<\/p>\n<p>Enable <strong>TCP host access<\/strong> for Model Runner (via Docker Desktop GUI or CLI)<\/p>\n<div class=\"wp-block-ponyo-image\">\n<\/div>\n<p class=\"has-small-font-size\"><em>Screenshot of Docker Desktop\u2019s Features in development tab showing host-side TCP support enabled for Docker Model Runner.<\/em><\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker desktop enable model-runner &#8211;tcp 12434\n<\/div>\n<p>Send a prompt using the OpenAI-compatible chat endpoint:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ncurl http:\/\/localhost:12434\/engines\/llama.cpp\/v1\/chat\/completions <br \/>\n  -H &#8220;Content-Type: application\/json&#8221; <br \/>\n  -d &#8216;{<br \/>\n    &#8220;model&#8221;: &#8220;ai\/smollm2&#8221;,<br \/>\n    &#8220;messages&#8221;: [<br \/>\n      {&#8220;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: &#8220;You are a helpful assistant.&#8221;},<br \/>\n      {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: &#8220;Tell me about the fall of Rome.&#8221;}<br \/>\n    ]<br \/>\n  }&#8217;\n<\/div>\n<p><strong>Note:<\/strong> No API key required \u2014 this runs locally and securely on your machine.<\/p>\n<h3 class=\"wp-block-heading\">Step 5: Package Your Own Model<\/h3>\n<p>You can package your own <strong>pre-trained GGUF model<\/strong> as a Docker-compatible artifact if you already have a .gguf file \u2014 such as one downloaded from Hugging Face or converted using tools like llama.cpp.<\/p>\n<p><strong>Note: <\/strong><em>This guide assumes you already have a .gguf model file. It does not cover how to train or convert models to GGUF.<\/em><\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model package <br \/>\n  &#8211;gguf &#8220;$(pwd)\/model.gguf&#8221; <br \/>\n  &#8211;license &#8220;$(pwd)\/LICENSE.txt&#8221; <br \/>\n  &#8211;push registry.example.com\/ai\/custom-llm:v1\n<\/div>\n<p>This is ideal for custom-trained or private models. You can now pull it like any other model:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model pull registry.example.com\/ai\/custom-llm:v1\n<\/div>\n\n<h3 class=\"wp-block-heading\">Step 6: Optimize &amp; Iterate<\/h3>\n<p>Use docker model logs to monitor model usage and debug issues<\/p>\n<p>Set up CI\/CD to automate pulls, scans, and packaging<\/p>\n<p>Track model lineage and training versions to ensure consistency<\/p>\n<p>Use semantic versioning (:v1, :2025-05, etc.) instead of latest when packaging custom models<\/p>\n<p>Only one model can be loaded at a time; requesting a new model will unload the previous one.<\/p>\n<h3 class=\"wp-block-heading\">Compose Integration (Optional)<\/h3>\n<p>Docker Compose v2.35+ (included in Docker Desktop 4.41+) introduces support for AI model services using a new provider.type: model. You can define models directly in your compose.yml and reference them in app services using depends_on.<\/p>\n<p>During docker compose up, Docker Model Runner automatically pulls the model and starts it on the host system, then injects connection details into dependent services using environment variables such as MY_MODEL_URL and MY_MODEL_MODEL, where MY_MODEL matches the name of the model service.<\/p>\n<p>This enables seamless multi-container AI applications \u2014 with zero extra glue code. <a href=\"https:\/\/docs.docker.com\/compose\/how-tos\/model-runner\/\" target=\"_blank\">Learn more<\/a>.<\/p>\n<h2 class=\"wp-block-heading\">Navigating AI Development Challenges<\/h2>\n<p><strong>Latency<\/strong>: Use quantized GGUF models<\/p>\n<p><strong>Security<\/strong>: Never run unknown models; validate sources and attach licenses<\/p>\n<p><strong>Compliance<\/strong>: Mask PII, respect data consent<\/p>\n<p><strong>Costs<\/strong>: Run locally to avoid cloud compute bills<\/p>\n<h2 class=\"wp-block-heading\">Best Practices<\/h2>\n<p>Prefer GGUF models for optimal CPU inference<\/p>\n<p>Use the &#8211;license flag when packaging custom models to ensure compliance<\/p>\n<p>Use versioned tags (e.g., :v1, :2025-05) instead of latest<\/p>\n<p>Monitor model logs using docker model logs<\/p>\n<p>Validate model sources before pulling or packaging<\/p>\n<p>Only pull models from trusted sources (e.g., Docker Hub\u2019s ai\/ namespace or verified Hugging Face repos).<\/p>\n<p>Review the license and usage terms for each model before packaging or deploying.<\/p>\n<h2 class=\"wp-block-heading\">The Road Ahead<\/h2>\n<p>Support for Retrieval-Augmented Generation (RAG)<\/p>\n<p>Expanded multimodal support (text + images, video, audio)<\/p>\n<p>LLMs as services in Docker Compose <em>(Requires Docker Compose v2.35+)<\/em><em><br \/><\/em><\/p>\n<p>More granular Model Dashboard features in Docker Desktop<\/p>\n<p>Secure packaging and deployment pipelines for private AI models<\/p>\n<p>Docker Model Runner lets DevOps teams treat models like any other artifact \u2014 pulled, tagged, versioned, tested, and deployed.<\/p>\n\n<h2 class=\"wp-block-heading\">Final Thoughts<\/h2>\n<p>You don\u2019t need a GPU cluster or external API to build AI apps. Learn more and explore everything you can do with <a href=\"https:\/\/docs.docker.com\/model-runner\/\" target=\"_blank\">Docker Model Runner<\/a>:<\/p>\n<p>Pull prebuilt models from Docker Hub or Hugging Face<\/p>\n<p>Run them locally using the CLI, API, or <strong>Docker Desktop\u2019s Model tab<\/strong><strong><br \/><\/strong><\/p>\n<p>Package and push your own models as OCI artifacts<\/p>\n<p>Integrate with your CI\/CD pipelines securely<\/p>\n<p>You can also find other helpful information to get started at:\u00a0<\/p>\n<p><a href=\"https:\/\/hub.docker.com\/u\/ai\" target=\"_blank\">Docker Hub \u2013 AI Namespace\u00a0<\/a><\/p>\n<p><a href=\"https:\/\/docs.docker.com\/model-runner\/\" target=\"_blank\">Docker Model Runner Docs\u00a0<\/a><\/p>\n<p><a href=\"https:\/\/platform.openai.com\/docs\" target=\"_blank\">OpenAI-Compatible API Guide<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/docker\/hello-genai\" target=\"_blank\">hello-genai Sample App<\/a><\/p>\n<p><strong>You\u2019re not just deploying containers \u2014 you\u2019re delivering intelligence.<\/strong><\/p>\n\n<h3 class=\"wp-block-heading\">Learn more<\/h3>\n<p>Read our quickstart guide to <a href=\"https:\/\/www.docker.com\/blog\/run-llms-locally\/\">Docker Model Runner<\/a>.<\/p>\n<p>Find documentation for <a href=\"https:\/\/docs.docker.com\/model-runner\/\" target=\"_blank\">Model Runner<\/a>.<\/p>\n<p>Subscribe to the <a href=\"https:\/\/www.docker.com\/newsletter-subscription\/\">Docker Navigator Newsletter<\/a>.<\/p>\n<p>New to Docker? <a href=\"https:\/\/hub.docker.com\/signup?_gl=1*1v81gq1*_gcl_au*MTQxNjU3MjYxNS4xNzQyMjI1MTk2*_ga*MTMxODI0ODQ4LjE3NDE4MTI3NTA.*_ga_XJWPQMJYHQ*czE3NDg0NTYyNzIkbzI2JGcxJHQxNzQ4NDU2MzI2JGo2JGwwJGgw\" target=\"_blank\">Create an account<\/a>.\u00a0<\/p>\n<p>Have questions? The <a href=\"https:\/\/www.docker.com\/community\/\">Docker community is here to help<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Introduction As a Senior DevOps Engineer and Docker Captain, I\u2019ve helped build AI systems for everything from retail personalization to [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2122","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2122"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2122\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}