{"id":2578,"date":"2025-10-09T13:14:16","date_gmt":"2025-10-09T13:14:16","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/09\/lora-explained-faster-more-efficient-fine-tuning-with-docker\/"},"modified":"2025-10-09T13:14:16","modified_gmt":"2025-10-09T13:14:16","slug":"lora-explained-faster-more-efficient-fine-tuning-with-docker","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/09\/lora-explained-faster-more-efficient-fine-tuning-with-docker\/","title":{"rendered":"LoRA Explained: Faster, More Efficient Fine-Tuning with Docker"},"content":{"rendered":"<p>Fine-tuning a language model doesn\u2019t have to be daunting. In our previous post on <a href=\"https:\/\/www.docker.com\/blog\/fine-tuning-models-with-offload-and-unsloth\/\">fine-tuning models<\/a> with Docker Offload and Unsloth, we walked through how to train small, local models efficiently using Docker\u2019s familiar workflows. This time, we\u2019re narrowing the focus.<\/p>\n<p>Instead of asking a model to be good at everything, we can <strong>specialize it<\/strong>: teaching it a narrow but valuable skill, like consistently masking personally identifiable information (PII) in text. Thanks to techniques like <strong>LoRA (Low-Rank Adaptation)<\/strong>, this process is not only feasible on modest resources, it\u2019s fast and efficient.<\/p>\n<p>Even better, with Docker\u2019s ecosystem the entire fine-tuning pipeline: training, packaging, and sharing, becomes approachable. You don\u2019t need a bespoke ML setup or a research lab workstation. You can iterate quickly, keep your workflow portable, and publish results for others to try with the same Docker commands you already know.<\/p>\n<p>In this post, I\u2019ll walk through a <strong>hands-on fine-tuning experiment<\/strong>: adapting the Gemma 3 270M model into a compact assistant capable of reliably masking PII.<\/p>\n<h2 class=\"wp-block-heading\">What\u2019s Low-Rank Adaptation (LoRA)?<\/h2>\n<p>Fine-tuning starts with a pre-trained model, one that has already learned the general structure and patterns of language.<\/p>\n<p>Instead of training it from scratch (which would consume massive amounts of compute and risk catastrophic forgetting, where the model loses its prior knowledge), we can use a more efficient method called LoRA (Low-Rank Adaptation).<\/p>\n<p>LoRA allows us to teach the model new tasks or behaviors without overwriting what it already knows, by adding small, trainable adapter layers while keeping the base model frozen.<\/p>\n<h3 class=\"wp-block-heading\">How does LoRA work?<\/h3>\n<p>At a high level, LoRA works like this:<\/p>\n<p><strong>Freeze the base model<\/strong>: The model\u2019s original weights (its core knowledge of language) remain unchanged.<\/p>\n<p><strong>Add adapter layers<\/strong>: Small, trainable \u201cside modules\u201d are inserted into specific parts of the model. These adapters learn only the new behavior or skill you want to teach.<\/p>\n<p><strong>Train efficiently<\/strong>: During fine-tuning, only the adapter parameters are updated. The rest of the model stays static, which dramatically reduces compute and memory requirements.<\/p>\n<h2 class=\"wp-block-heading\">LoRA experiment: Fine-tune Gemma 3 270M to mask PII<\/h2>\n<p>For this experiment, the model already knows how to read, write, and follow instructions. Our job is simply to teach it the specific pattern we care about, for example:<\/p>\n<p>\u201c<em>Given some text, replace PII with standardized placeholders while leaving everything else untouched<\/em>.\u201d<\/p>\n<p>The fine-tuning process consists of four steps:<\/p>\n<p>Prepare the dataset<\/p>\n<p>Prepare LoRA adapter<\/p>\n<p>Train the model<\/p>\n<p>Export the resulting model<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p class=\"has-xs-font-size\">Figure 1: Four steps of fine-tuning with LoRA  <\/p>\n\n<p>In this example, we use Supervised Fine-Tuning (SFT): each training example pairs raw text containing PII with its correctly redacted version. Over many such examples, the model internalizes the pattern and learns to generalize the redaction rules.<\/p>\n<div class=\"wp-block-ponyo-image\">\n                <\/div>\n<p>The quality of the dataset is critical, the cleaner and more representative your dataset, the better your fine-tuned model will perform.<\/p>\n<p>Before we dive into the steps, it\u2019s crucial to understand Chat Templates.<\/p>\n<h3 class=\"wp-block-heading\">Understanding Chat Templates<\/h3>\n<p>When you send a request like below to Gemma 3 270M, the model doesn\u2019t see this JSON structure directly.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n &#8220;messages&#8221;: [<br \/>\n        {<br \/>\n            &#8220;role&#8221;: &#8220;user&#8221;,<br \/>\n            &#8220;content&#8221;: &#8220;Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, &#8216; &#8216; and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio L\u00f3pez Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau L\u00f3pez, who was born on 21-07-2021&#8221;<br \/>\n        }<br \/>\n    ]\n<\/div>\n<p>Instead, the input is transformed into a <strong>chat-formatted prompt<\/strong> with special tokens:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n&lt;start_of_turn&gt;user Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, &#8216; &#8216; and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio L\u00f3pez Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau L\u00f3pez, who was born on 21-07-2021&lt;end_of_turn&gt;\n<\/div>\n<p>Notice how the message has been rewrapped and extra tokens like &lt;start_of_turn&gt; and &lt;end_of_turn&gt; have been inserted. These tokens are part of the model\u2019s chat template, the standardized structure it expects at inference time.<\/p>\n<p>Different models use different templates. For example, Gemma uses &lt;start_of_turn&gt; markers, while other models might rely on &lt;bos&gt; or others.<\/p>\n<p>This is exactly why the first step is \u201cPrepare the dataset.\u201d When fine-tuning, you must format your training data with the same chat template that the model will use during inference. This alignment ensures the fine-tuned model is robust, because it has been trained on data that looks exactly like what it will encounter in production.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Prepare the dataset: Teaching through examples<\/strong><\/h2>\n<p>The dataset is the bridge between general-purpose language ability and task-specific expertise. Each example is a demonstration of what we want the model to do: a prompt with raw text containing PII, and a response showing the redacted version.<\/p>\n<p>In the script this is how the original Dataset is formatted using the Chat Template of the model (see the apply_chat_template function):<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nmax_seq_length = 2048<br \/>\nmodel, tokenizer = FastModel.from_pretrained(<br \/>\n    model_name=&#8221;unsloth\/gemma-3-270m-it&#8221;,<br \/>\n    max_seq_length=max_seq_length,<br \/>\n    load_in_4bit=False,<br \/>\n    load_in_8bit=False,<br \/>\n    full_finetuning=False,<br \/>\n)\n<p>with open(&#8220;pii_redaction_train.json&#8221;, &#8220;r&#8221;, encoding=&#8221;utf-8&#8243;) as f:<br \/>\n    data = json.load(f)<\/p>\n<p>ds = Dataset.from_list(data)<\/p>\n<p>def to_text(ex):<br \/>\n    resp = ex[&#8220;response&#8221;]<br \/>\n    if not isinstance(resp, str):<br \/>\n        resp = json.dumps(resp, ensure_ascii=False)<br \/>\n    msgs = [<br \/>\n        {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: ex[&#8220;prompt&#8221;]},<br \/>\n        {&#8220;role&#8221;: &#8220;assistant&#8221;, &#8220;content&#8221;: resp},<br \/>\n    ]<br \/>\n    return {<br \/>\n        &#8220;text&#8221;: tokenizer.apply_chat_template(<br \/>\n            msgs, tokenize=False, add_generation_prompt=False<br \/>\n        )<br \/>\n    }<\/p>\n<p>dataset = ds.map(to_text, remove_columns=ds.column_names)<\/p>\n<\/div>\n<p>You can print some of the pairs to see how it looks like via:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nfor i in range(3):<br \/>\n    print(dataset[i][&#8220;text&#8221;])<br \/>\n    print(&#8220;=&#8221; * 80)\n<\/div>\n<p>An example of a dataset entry:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n&lt;bos&gt;&lt;start_of_turn&gt;user<br \/>\nMask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, and punctuation exactly. Return ONLY the redacted text.\n<p>Text:<br \/>\n&lt;p&gt;My child faozzsd379223 (DOB: May\/58) will undergo treatment with Dr. faozzsd379223, office at Hill Road. Our ZIP code is 28170-6392. Consult policy M.UE.227995. Contact number: 0070.606.322.6244. Handle transactions with 6225427220412963. Queries? Email: faozzsd379223@outlook.com.&lt;\/p&gt;&lt;end_of_turn&gt;<br \/>\n&lt;start_of_turn&gt;model<br \/>\n&lt;p&gt;My child [USERNAME_2] (DOB: [DATEOFBIRTH_1]) will undergo treatment with Dr. [USERNAME_1], office at [STREET_1]. Our ZIP code is [ZIPCODE_1]. Consult policy M.UE.227995. Contact number: [TELEPHONENUM_1]. Handle transactions with [CREDITCARDNUMBER_1]. Queries? Email: [EMAIL_1].&lt;\/p&gt;&lt;end_of_turn&gt;<\/p>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Prepare LoRA adapter: Standing on the shoulders of a base model<\/strong><\/h2>\n<p>Instead of starting from a blank slate, we begin with <strong>Gemma-3 270M-IT<\/strong>, a small but capable instruction-tuned model. By loading both the weights and the tokenizer, we get not just a model that understands text, but also the exact rules it uses to split and reconstruct sentences.<\/p>\n<p>Fine-tuning isn\u2019t reinventing language, it\u2019s layering task-specific expertise on top of a foundation that already knows how to read and write.<\/p>\n<p>For that, we\u2019ll use the LoRA technique.\u00a0<\/p>\n<h3 class=\"wp-block-heading\">Why we use LoRA<\/h3>\n<p>Training a large language model from scratch is extremely costly, because it means adjusting billions of parameters.<\/p>\n<p>But the good news is: you usually don\u2019t need to change everything to teach the model a new skill.<\/p>\n<p>That\u2019s where <strong>LoRA <\/strong>comes in. Instead of re-training the entire model, LoRA adds a few small, extra components, like \u201cadd-ons.\u201d When we fine-tune the model, we only adjust these add-ons, while the main model stays the same.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nfrom peft import LoraConfig, get_peft_model\n<p>lora_config = LoraConfig(<br \/>\n    r=16,<br \/>\n    lora_alpha=32,<br \/>\n    target_modules=[&#8220;q_proj&#8221;, &#8220;v_proj&#8221;],<br \/>\n    lora_dropout=0.05<br \/>\n)<\/p>\n<p>model = get_peft_model(base_model, lora_config)<\/p>\n<\/div>\n<p>These few lines tell the model: <em>keep your parameters frozen, but learn through a small set of low-rank adapters<\/em>. That\u2019s why fine-tuning is efficient and affordable.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Train the model: Fine-tuning in practice<\/strong><\/h2>\n<p>With the dataset ready and LoRA adapters in place, the actual training looks like classic supervised learning.<\/p>\n<p>Feed in the input (a user prompt).<\/p>\n<p>Compare the model\u2019s output with the expected response.<\/p>\n<p>Adjust the adapter weights to minimize the difference.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nmodel = model,<br \/>\n    tokenizer = tokenizer,<br \/>\n    train_dataset = dataset,<br \/>\n    eval_dataset = None, # Can set up evaluation!<br \/>\n    args = SFTConfig(<br \/>\n        dataset_text_field = &#8220;text&#8221;,<br \/>\n        per_device_train_batch_size = 8,<br \/>\n        gradient_accumulation_steps = 1, # Use GA to mimic batch size!<br \/>\n        warmup_steps = 5,<br \/>\n        num_train_epochs = 1, # Set this for 1 full training run.<br \/>\n        # max_steps = 100,<br \/>\n        learning_rate = 5e-5, # Reduce to 2e-5 for long training runs<br \/>\n        logging_steps = 1,<br \/>\n        optim = &#8220;adamw_8bit&#8221;,<br \/>\n        weight_decay = 0.01,<br \/>\n        lr_scheduler_type = &#8220;linear&#8221;,<br \/>\n        seed = 3407,<br \/>\n        output_dir=&#8221;outputs&#8221;,<br \/>\n        report_to = &#8220;none&#8221;, # Use this for WandB etc<br \/>\n    ),<br \/>\n)\n<p>trainer_stats = trainer.train()<\/p>\n<\/div>\n<p>Over many iterations, the model internalizes the rules of PII masking, learning not only to replace emails with [EMAIL] but also to preserve punctuation, whitespace, and all non-PII content exactly as instructed.<\/p>\n<p>What\u2019s important here is that fine-tuning doesn\u2019t overwrite the model\u2019s general capabilities. The model still knows how to generate coherent text, we\u2019re just biasing it toward one more skill.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Export the resulting model: Merging weights<\/strong><\/h2>\n<p>Once training finishes, we have a base model plus a set of LoRA adapters. That\u2019s useful for experimentation, but for deployment we often prefer a single consolidated model.<\/p>\n<p>By merging the adapters back into the base weights, we produce a standalone checkpoint that behaves just like the original model, except it now has PII masking expertise built in.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nmodel.save_pretrained_merged(&#8220;result&#8221;, tokenizer, save_method = &#8220;merged_16bit&#8221;)\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Try and share your model<\/strong><\/h3>\n<p>After fine-tuning, the next natural step is to <strong>try your model in action<\/strong> and, if it works well, <strong>share it with others<\/strong>. With <a href=\"https:\/\/www.docker.com\/products\/model-runner\/\">Docker Model Runner<\/a>, you can <a href=\"https:\/\/www.docker.com\/blog\/how-to-build-run-and-package-ai-models-locally-with-docker-model-runner\/\">package your fine-tuned model<\/a>, push it to <a href=\"https:\/\/www.docker.com\/products\/docker-hub\/\">Docker Hub<\/a>, and make it instantly runnable anywhere. No messy setup, no GPU-specific headaches, just a familiar Docker workflow for distributing and testing AI models.<\/p>\n<p>So once your adapters are trained and merged, don\u2019t stop there: run it, publish it, and let others try it too. In the <a href=\"https:\/\/www.docker.com\/blog\/fine-tuning-models-with-offload-and-unsloth\/\">previous post<\/a>, I showed how easy it is to do that step-by-step.<\/p>\n<p>Fine-tuning makes your model <strong>specialized<\/strong>, but Docker makes it <strong>accessible and shareable<\/strong>. Together, they turn small local models from curiosities into practical tools ready to be used, and reused, by the community.<\/p>\n<h3 class=\"wp-block-heading\">We\u2019re building this together!<\/h3>\n<p>Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our <a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\">GitHub repository<\/a>. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it\u2019s improving documentation, fixing a bug, or a new feature, every contribution helps. Let\u2019s build the future of model deployment together!<\/p>\n<h3 class=\"wp-block-heading\">Learn more<\/h3>\n<p>Learn <a href=\"https:\/\/www.docker.com\/blog\/fine-tuning-models-with-offload-and-unsloth\/\">how to fine-tune local models<\/a> with Docker Offload and Unsloth<\/p>\n<p>Check out the Docker Model Runner General Availability <a href=\"https:\/\/www.docker.com\/blog\/announcing-docker-model-runner-ga\/\">announcement<\/a><\/p>\n<p>Visit our<a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\"> Model Runner GitHub repo<\/a>! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!<\/p>\n<p>Get started with Model Runner with a simple <a href=\"https:\/\/github.com\/docker\/hello-genai\" target=\"_blank\">hello GenAI application<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Fine-tuning a language model doesn\u2019t have to be daunting. In our previous post on fine-tuning models with Docker Offload and [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2578","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2578","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2578"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2578\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2578"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2578"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}