{"id":2549,"date":"2025-10-02T12:13:26","date_gmt":"2025-10-02T12:13:26","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/02\/fine-tuning-local-models-with-docker-offload-and-unsloth\/"},"modified":"2025-10-02T12:13:26","modified_gmt":"2025-10-02T12:13:26","slug":"fine-tuning-local-models-with-docker-offload-and-unsloth","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/02\/fine-tuning-local-models-with-docker-offload-and-unsloth\/","title":{"rendered":"Fine-Tuning Local Models with Docker Offload and Unsloth"},"content":{"rendered":"<p>I\u2019ve been experimenting with local models for a while now, and the progress in making them accessible has been exciting. Initial experiences are often fantastic, many models, like <strong>Gemma 3 270M<\/strong>, are lightweight enough to run on common hardware. This potential for broad deployment is a major draw.<\/p>\n<p>However, as I\u2019ve tried to build meaningful, specialized applications with these smaller models, I\u2019ve consistently encountered challenges in achieving the necessary performance for complex tasks. For instance, in a recent <a href=\"https:\/\/www.docker.com\/blog\/local-llm-tool-calling-a-practical-evaluation\/\">experiment testing the tool-calling efficiency of various models<\/a>, we observed that many local models (and even several remote ones) struggled to meet the required performance benchmarks. This realization prompted a shift in my strategy.<\/p>\n<p>I\u2019ve come to appreciate that simply relying on small, general-purpose models is often insufficient for achieving truly effective results on specific, demanding tasks. Even larger models can require significant effort to reach acceptable levels of performance and efficiency.<\/p>\n<p>And yet, the potential of local models is too compelling to set aside. The advantages are significant:<\/p>\n<p>Privacy<\/p>\n<p>Offline capabilities<\/p>\n<p>No token usage costs<\/p>\n<p>No more \u201coverloaded\u201d error messages<\/p>\n<p>So I started looking for alternatives, and that\u2019s when I came across<a href=\"https:\/\/github.com\/unslothai\/unsloth\" target=\"_blank\"> Unsloth<\/a>, a project designed to make fine-tuning models much faster and more accessible. Its growing popularity (<a href=\"https:\/\/www.star-history.com\/#unslothai\/unsloth&amp;Date\" target=\"_blank\">star history<\/a>) made me curious enough to give it a try.<\/p>\n<p>In this post, I\u2019ll walk you through fine-tuning a sub-1GB model to redact sensitive info without breaking your Python setup. With <a href=\"https:\/\/www.docker.com\/products\/docker-offload\/\">Docker Offload<\/a> and Unsloth, you can go from a baseline model to a portable, shareable GGUF artifact on Docker Hub in less than 30 minutes. In part 2 of this post, I will share the detailed steps of fine-tuning the model.\u00a0<\/p>\n<h2 class=\"wp-block-heading\">Challenges of fine-tuning models<\/h2>\n<p>Setting up the right environment to fine-tune models can be\u2026 painful. It\u2019s fragile, error-prone, and honestly a little scary at times. I always seem to break my Python environment one way or another, and I lose hours just wrestling with dependencies and runtime versions before I can even start training.<\/p>\n<p>Fortunately, the folks at <strong>Unsloth<\/strong> solved this with <a href=\"https:\/\/docs.unsloth.ai\/get-started\/install-and-update\/docker\" target=\"_blank\">a ready-to-use <strong>Docker image<\/strong><\/a>. Instead of wasting time (and patience) setting everything up, I can just run a container and get started immediately.<\/p>\n<p>Of course, there\u2019s still the hardware requirement. I work on a MacBook Pro, and Unsloth doesn\u2019t support MacBooks natively, so normally, that would be a deal-breaker.<\/p>\n<p>But here\u2019s where <strong>Docker Offload<\/strong> comes in. With Offload, I can <a href=\"https:\/\/docs.docker.com\/offload\/\" target=\"_blank\">spin up GPU-backed resources in the cloud<\/a> and tap into NVIDIA acceleration, all while keeping my local workflow. That means I now have everything I need to fine-tune models, without fighting my laptop.<\/p>\n<p>Let\u2019s go for it.<\/p>\n<h2 class=\"wp-block-heading\">How to fine-tune models locally with Unsloth and Docker<\/h2>\n<p>Can a model smaller than 1GB reliably mask personally identifiable information (PII)?<\/p>\n<p>Here\u2019s the <strong>test input<\/strong>:<\/p>\n<p>This is an example of text that contains some data. <br \/>The author of this text is Ignacio L\u00f3pez Luna, but everybody calls him Ignasi. <br \/>His ID number is 123456789. <br \/>He has a son named Arnau L\u00f3pez, who was born on 21-07-2021.<\/p>\n<p>Desired <strong>output<\/strong>:<\/p>\n<p>This is an example of text that contains some data. <br \/>The author of this text is [MASKED] [MASKED], but everybody calls him [MASKED]. <br \/>His ID number is [MASKED]. <br \/>He has a son named [MASKED], who was born on [MASKED].<\/p>\n<p>When tested with <strong>Gemma 3 270M<\/strong> using<a href=\"https:\/\/docs.docker.com\/ai\/model-runner\/\" target=\"_blank\"> Docker Model Runner<\/a>, the output was:<\/p>\n<p>[PERSON]<\/p>\n<p>Clearly, not usable. Time to fine-tune.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 1: Clone the example project<\/strong><\/h3>\n<div class=\"wp-block-syntaxhighlighter-code \">\n\u200b\u200bgit clone https:\/\/github.com\/ilopezluna\/fine-tuning-examples.git<br \/>\ncd fine-tuning-examples\/pii-masking\n<\/div>\n<p>The project contains a ready to use python script to fine tune Gemma 3 using the <a href=\"https:\/\/huggingface.co\/datasets\/ai4privacy\/pii-masking-400k\" target=\"_blank\">pii-masking-400k dataset<\/a> from <a href=\"https:\/\/www.ai4privacy.com\/\" target=\"_blank\">ai4privacy<\/a>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 2: Start Docker Offload (with GPU)<\/strong><\/h3>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker offload start\n<\/div>\n<p>Select your account.<\/p>\n<p>Answer <strong>Yes<\/strong> when asked about GPU support (you\u2019ll get an NVIDIA L4-backed instance).<\/p>\n<p>Check status:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker offload status\n<\/div>\n<p>See the<a href=\"https:\/\/docs.docker.com\/offload\/quickstart\/\" target=\"_blank\"> Docker Offload Quickstart guide<\/a>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 3: Run the Unsloth container<\/strong><\/h3>\n<p>The official Unsloth image includes Jupyter and some example notebooks. You can start it like this:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker run -d -e JUPYTER_PORT=8000 <br \/>\n  -e JUPYTER_PASSWORD=&#8221;mypassword&#8221; <br \/>\n  -e USER_PASSWORD=&#8221;unsloth2024&#8243; <br \/>\n  -p 8000:8000 <br \/>\n  -v $(pwd):\/workspace\/work <br \/>\n  &#8211;gpus all <br \/>\n  unsloth\/unsloth\n<\/div>\n<p>Now, let\u2019s attach a shell to the container:\u00a0<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n docker exec -it $(docker ps -q) bash\n<\/div>\n<p>Useful paths inside the container:<\/p>\n<p>\/workspace\/unsloth-notebooks\/ \u2192 example fine-tuning notebooks<\/p>\n<p>\/workspace\/work\/ \u2192 your mounted working directory<\/p>\n<p>Thanks to <strong>Docker Offload<\/strong> (with Mutagen under the hood), the folder \/workspace\/work\/ stays in sync between cloud GPU and local dev machine.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 4: Fine-tune<\/strong><\/h3>\n<p>The script <em>finetune.py<\/em> is a small training loop built around Unsloth. Its purpose is to take a base language model and adapt it to a new task using supervised fine-tuning with LoRA. In this example, the model is trained on a dataset that teaches it how to mask personally identifiable information (PII) in text.<\/p>\n<p>LoRA makes the process lightweight: instead of updating all of the model\u2019s parameters, it adds small adapter layers and only trains those. That means the fine-tune runs quickly, fits on a single GPU, and produces a compact set of weights you can later merge back into the base model.<\/p>\n<p>When you run:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nunsloth@46b6d7d46c1a:\/workspace$ cd work<br \/>\nunsloth@46b6d7d46c1a:\/workspace\/work$ python finetune.py<br \/>\nUnsloth: Will patch your computer to enable 2x faster free finetuning.<br \/>\n[&#8230;]\n<\/div>\n<p>The script loads the base model, prepares the dataset, runs a short supervised fine-tuning pass, and saves the resulting LoRA weights into your mounted \/workspace\/work\/ folder. Thanks to Docker Offload, those results are also synced back to your local machine automatically.<\/p>\n<p>The whole training run is designed to complete in under 20 minutes on a modern GPU, leaving you with a model that has \u201clearned\u201d the new masking behavior and is ready for conversion in the next step.<\/p>\n<p>For a deeper walkthrough of how the dataset is built, why it\u2019s important and how LoRA is configured, stay tuned for part 2 of this blog!\u00a0\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 5: Convert to GGUF<\/strong><\/h3>\n<p>At this point you\u2019ll have the fine-tuned model artifacts sitting under \/workspace\/work\/.<\/p>\n<p>To package the model for Docker Hub and Docker Model Runner usage, it must be in <strong>GGUF format<\/strong>. (Unsloth will support this directly soon, but for now we convert manually.)<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nunsloth@1b9b5b5cfd49:\/workspace\/work$ cd ..<br \/>\nunsloth@1b9b5b5cfd49:\/workspace$ git clone https:\/\/github.com\/ggml-org\/llama.cpp<br \/>\nCloning into &#8216;llama.cpp&#8217;&#8230;<br \/>\n[&#8230;]<br \/>\nResolving deltas: 100% (45613\/45613), done.<br \/>\nunsloth@1b9b5b5cfd49:\/workspace$ python .\/llama.cpp\/convert_hf_to_gguf.py work\/result\/ &#8211;outfile work\/result.gguf<br \/>\n[&#8230;]<br \/>\nINFO:hf-to-gguf:Model successfully exported to work\/result.gguf\n<\/div>\n<p>Next, check that the file already exists locally (this indicates the automatic Mutagen-powered file sync process did finish):<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nunsloth@46b6d7d46c1a:\/workspace$ exit<br \/>\nexit<br \/>\n((.env3.12) ) ilopezluna@localhost pii-masking % ls -alh result.gguf<br \/>\n-rw-r&#8211;r&#8211;@ 1 ilopezluna  staff   518M Sep 23 15:58 result.gguf\n<\/div>\n<p>At this point, you can stop Docker Offload:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker offload stop\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Step 6: Package and share on Docker Hub<\/strong><\/h3>\n<p>Now let\u2019s package the fine-tuned model and push it to Docker Hub:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n((.env3.12) ) ilopezluna@localhost pii-masking % docker model package &#8211;gguf \/Users\/ilopezluna\/Projects\/fine-tuning-examples\/pii-masking\/result.gguf ignaciolopezluna020\/my-awesome-model:version1 &#8211;push<br \/>\nAdding GGUF file from &#8220;\/Users\/ilopezluna\/Projects\/fine-tuning-examples\/pii-masking\/result.gguf&#8221;<br \/>\nPushing model to registry&#8230;<br \/>\nUploaded: 517.69 MB<br \/>\nModel pushed successfully\n<\/div>\n<p>You can find more details on distributing models in the<a href=\"https:\/\/www.docker.com\/blog\/publish-ai-models-on-docker-hub\/\"> Docker blog on packaging models<\/a>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Step 7: Try the results!<\/strong><\/h3>\n<p>Finally, run the fine-tuned model using Docker Model Runner:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model run ignaciolopezluna020\/my-awesome-model:version1 &#8220;Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, &#8216; &#8216; and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio L\u00f3pez Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau L\u00f3pez, who was born on 21-07-2021&#8221;<br \/>\nThis is an example of text that contains some data. The author of this text is [GIVENNAME_1] [SURNAME_1], but everybody calls him [GIVENNAME_1]. His ID number is [IDCARDNUM_1]. He has a son named [GIVENNAME_1] [SURNAME_1], who was born on [DATEOFBIRTH_1]\n<\/div>\n<p>Just compare with the original Gemma 3 270M output:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n((.env3.12) ) ilopezluna@F2D5QD4D6C pii-masking % docker model run ai\/gemma3:270M-F16 &#8220;Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, &#8216; &#8216; and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio L\u00f3pez Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau L\u00f3pez, who was born on 21-07-2021&#8221;<br \/>\n[PERSON]\n<\/div>\n<p>The fine tuned model is <strong>far more useful<\/strong>, and now it\u2019s already published on Docker Hub for anyone to try.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Why fine-tuning models with Docker matters<\/strong><\/h2>\n<p>This experiment shows that small local models don\u2019t have to stay as \u201ctoys\u201d or curiosities. With the right tooling, they can become <strong>practical, specialized assistants<\/strong> for real-world problems.<\/p>\n<p><strong>Speed<\/strong>: Fine-tuning a sub-1GB model took under 20 minutes with Unsloth and Docker Offload. That\u2019s fast enough for iteration and experimentation.<\/p>\n<p><strong>Accessibility<\/strong>: Even on a machine without a GPU, Docker Offload unlocked GPU-backed training without extra hardware.<\/p>\n<p><strong>Portability<\/strong>: Once packaged, the model is easy to share, and runs anywhere thanks to Docker.<\/p>\n<p><strong>Utility<\/strong>: Instead of producing vague or useless answers, the fine-tuned model reliably performs one job, masking PII, something that could be immediately valuable in many workflows.<\/p>\n<p>This is the power of fine-tuning models: turning small, general-purpose models into <strong>focused, reliable tools<\/strong>. And with Docker\u2019s ecosystem, you don\u2019t need to be an ML researcher with a huge workstation to make it happen. You can train, test, package, and share, all with familiar Docker workflows.<\/p>\n<p>So next time you think <em>\u201csmall models aren\u2019t useful\u201d<\/em>, remember, with a bit of fine-tuning, they absolutely can be.<\/p>\n<p>This takes small local models from \u201cinteresting demo\u201d to <strong>practical, usable tools<\/strong>.<\/p>\n<h2 class=\"wp-block-heading\">We\u2019re building this together!<\/h2>\n<p>Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our <a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\">GitHub repository<\/a>. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it\u2019s improving documentation, fixing a bug, or a new feature, every contribution helps. Let\u2019s build the future of model deployment together!<\/p>\n<p><a href=\"https:\/\/www.docker.com\/products\/docker-offload\/\">Start with Docker Offload for GPU on demand<\/a> \u2192<\/p>\n<h3 class=\"wp-block-heading\">Learn more<\/h3>\n<p>Check out Model Runner General Availability <a href=\"https:\/\/www.docker.com\/blog\/announcing-docker-model-runner-ga\/\">announcement<\/a><\/p>\n<p>Visit our<a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\"> Model Runner GitHub repo<\/a>!<\/p>\n<p>Learn how Compose makes <a href=\"https:\/\/www.docker.com\/blog\/build-ai-agents-with-docker-compose\/\">building AI apps and agents easier<\/a><\/p>\n<p>Check out Unsloth <a href=\"https:\/\/docs.unsloth.ai\/new\/how-to-train-llms-with-unsloth-and-docker\" target=\"_blank\">documentation<\/a> for more details on the Unsloth Docker image.<\/p>","protected":false},"excerpt":{"rendered":"<p>I\u2019ve been experimenting with local models for a while now, and the progress in making them accessible has been exciting. [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2549","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2549","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2549"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2549\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2549"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2549"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2549"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}