{"id":3031,"date":"2025-12-11T16:16:51","date_gmt":"2025-12-11T16:16:51","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/12\/11\/docker-model-runner-now-supports-vllm-on-windows\/"},"modified":"2025-12-11T16:16:51","modified_gmt":"2025-12-11T16:16:51","slug":"docker-model-runner-now-supports-vllm-on-windows","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/12\/11\/docker-model-runner-now-supports-vllm-on-windows\/","title":{"rendered":"Docker Model Runner now supports vLLM on Windows"},"content":{"rendered":"<p>Great news for Windows developers working with AI models: Docker Model Runner now supports vLLM on Docker Desktop for Windows with WSL2 and NVIDIA GPUs!<\/p>\n<p>Until now, <a href=\"https:\/\/www.docker.com\/blog\/docker-model-runner-integrates-vllm\/\">vLLM support<\/a> in Docker Model Runner was limited to Docker Engine on Linux. With this update, Windows developers can take advantage of vLLM\u2019s high-throughput inference capabilities directly through Docker Desktop, leveraging their NVIDIA GPUs for accelerated local AI development.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What is Docker Model Runner?<\/strong><\/h2>\n<p>For those who haven\u2019t tried it yet, Docker Model Runner is our new \u201cit just works\u201d experience for running generative AI models.<\/p>\n<p>Our goal is to make running a model as simple as running a container.<\/p>\n<p>Here\u2019s what makes it great:<\/p>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.docker.com\/blog\/docker-model-run-prompt\/\"><strong>Simple UX<\/strong><\/a><strong>:<\/strong> We\u2019ve streamlined the process down to a single, intuitive command: docker model run &lt;model-name&gt;.<\/li>\n<li><strong>Broad GPU Support:<\/strong> While we started with NVIDIA, we\u2019ve recently added <a href=\"https:\/\/www.docker.com\/blog\/docker-model-runner-vulkan-gpu-support\/\"><strong>Vulkan support<\/strong><\/a>. This is a big deal\u2014it means Model Runner works on pretty much <strong>any modern GPU<\/strong>, including AMD and Intel, making AI accessible to more developers than ever.<\/li>\n<li><a href=\"https:\/\/www.docker.com\/blog\/docker-model-runner-integrates-vllm\/\"><strong>vLLM<\/strong><\/a><strong>:<\/strong> Perform high-throughput inference with an NVIDIA GPU<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>What is vLLM?<\/strong><\/h2>\n<p>vLLM is a high-throughput inference engine for large language models. It\u2019s designed for efficient memory management of the KV cache and excels at handling concurrent requests with impressive performance. If you\u2019re building AI applications that need to serve multiple requests or require high-throughput inference, vLLM is an excellent choice. Learn more <a href=\"https:\/\/github.com\/vllm-project\/vllm\" rel=\"nofollow noopener\" target=\"_blank\">here<\/a>.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Prerequisites<\/strong><\/h2>\n<p>Before getting started, make sure you have the <a href=\"https:\/\/docs.docker.com\/desktop\/features\/gpu\/\" rel=\"nofollow noopener\" target=\"_blank\">prerequisites for GPU support<\/a>:<\/p>\n<ul class=\"wp-block-list\">\n<li>Docker Desktop for Windows (starting with Docker Desktop 4.54)<\/li>\n<li>WSL2 backend enabled in Docker Desktop<\/li>\n<li>NVIDIA GPU with updated drivers<\/li>\n<li>GPU support configured in Docker Desktop<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Getting Started<\/strong><\/h2>\n<p><strong>Step 1: Enable Docker Model Runner<\/strong><\/p>\n<p>First, ensure Docker Model Runner is enabled in Docker Desktop. You can do this through the Docker Desktop settings or via the command line:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ndocker desktop enable model-runner --tcp 12434\n<\/pre>\n<\/div>\n<p><strong>Step 2: Install the vLLM Backend<\/strong><\/p>\n<p>In order to be able to use vLLM, install the vLLM runner with CUDA support:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ndocker model install-runner --backend vllm --gpu cuda\n<\/pre>\n<\/div>\n<div class=\"wp-block-ponyo-image\">\n            <img data-opt-id=1251471023  fetchpriority=\"high\" decoding=\"async\" width=\"1000\" height=\"514\" src=\"https:\/\/www.docker.com\/app\/uploads\/2025\/12\/vLLM-Windows-image-1.png\" class=\"fade-in attachment-full size-full\" alt=\"vLLM Windows image 1\" title=\"- vLLM Windows image 1\" \/>\n    <\/div>\n<p><strong>Step 3: Verify the Installation<\/strong><\/p>\n<p>Check that both inference engines are running:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ndocker model install-runner --backend vllm --gpu cuda\n<\/pre>\n<\/div>\n<p>You should see output similar to:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\">\nDocker Model Runner is running\n\nStatus:\nllama.cpp: running llama.cpp version: c22473b\nvllm: running vllm version: 0.12.0\n<\/pre>\n<\/div>\n<p><strong>Step 4: Run a Model with vLLM<\/strong><\/p>\n<p>Now you can pull and run models optimized for vLLM. Models with the <strong>-vllm<\/strong> suffix on Docker Hub are packaged for vLLM:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ndocker model run ai\/smollm2-vllm \"Tell me about Docker.\"\n<\/pre>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Troubleshooting Tips<\/strong><\/h2>\n<h3 class=\"wp-block-heading\"><strong>GPU Memory Issues<\/strong><\/h3>\n<p>If you encounter an error like:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\">\nValueError: Free memory on device (6.96\/8.0 GiB) on startup is less than desired GPU memory utilization (0.9, 7.2 GiB).\n\n<\/pre>\n<\/div>\n<p>You can configure the GPU memory utilization for a specific mode:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ndocker model configure --gpu-memory-utilization 0.7 ai\/smollm2-vllm\n<\/pre>\n<\/div>\n<p>This reduces the memory footprint, allowing the model to run alongside other GPU workloads.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Why This Matters<\/strong><\/h2>\n<p>This update brings several benefits for Windows developers:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Production parity<\/strong>: Test with the same inference engine you\u2019ll use in production<\/li>\n<li><strong>Unified workflow<\/strong>: Stay within the Docker ecosystem you already know<\/li>\n<li><strong>Local development<\/strong>: Keep your data private and reduce API costs during development<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>How You Can Get Involved<\/strong><\/h2>\n<p>The strength of Docker Model Runner lies in its community, and there\u2019s always room to grow. We need your help to make this project the best it can be. To get involved, you can:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Star the repository:<\/strong> Show your support and help us gain visibility by starring the<a href=\"https:\/\/github.com\/docker\/model-runner\" rel=\"nofollow noopener\" target=\"_blank\"> Docker Model Runner repo<\/a>.<\/li>\n<li><strong>Contribute your ideas:<\/strong> Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We\u2019re excited to see what ideas you have!<\/li>\n<li><strong>Spread the word:<\/strong> Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.<\/li>\n<\/ul>\n<p>We\u2019re incredibly excited about this new chapter for Docker Model Runner, and we can\u2019t wait to see what we can build together. Let\u2019s get to work!<\/p>","protected":false},"excerpt":{"rendered":"<p>Great news for Windows developers working with AI models: Docker Model Runner now supports vLLM on Docker Desktop for Windows [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3032,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-3031","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=3031"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3031\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/3032"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=3031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=3031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=3031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}