Hot starts, batch inference, and what's next for Runpod Serverless. Webinar June 25.

How to fine-tune a model using Axolotl

Learn how to fine-tune Large Language Models (LLMs) using Runpod and Axolotl to deploy GPU pods, train a Llama 3 model with LoRA, and test customized.

How to fine-tune a model using Axolotl

Model fine-tuning is the process of adapting a pre-trained machine learning model to perform better on a specific task or dataset. This technique allows for improved performance and efficiency compared to training a model from scratch, as it leverages the knowledge already learned by the model.

Fine-tuning is ideal when you have limited data but want to enhance model performance. It is beneficial when the task differs significantly from the original training task of the model.

Fine-tuning is a subset of transfer learning, using knowledge an existing model already has as the starting point for learning new tasks. It’s easier and cheaper to hone an existing model’s capabilities rather than train a new model from scratch. For example, you can fine-tune an existing Large Language Model (LLM) to adjust its tone when responding to inquiries, or give it knowledge specific to your domain or business.

What you’ll learn

In this blog post you’ll learn how to:

  • Deploy a pod on Runpod based on the axolotl-runpod template
  • SSH into the pod
  • Fine-tune a Llama 3 model using LoRA
  • Prompt the fine-tuned model using an interactive UI

Requirements

1. Deploy a pod

You can fine-tune a model on Runpod using Axolotl, an open-source tool for fine-tuning AI models. Let’s deploy a pod that will fine-tune a model based on a dataset, and then run that model so we can test how it has changed.

  1. Log in to the Runpod Console and select Pod Templates from the left sidebar.
  2. Search for “axolotl” and select the axolotl-runpod template. This template uses an official Axolotl Docker image.
Runpod Hub search results for axolotl with the axolotl-Runpod pod template highlighted
  1. Select Deploy Pod.
Runpod console axolotl-Runpod template page with the Deploy Pod button highlighted
  1. Select a GPU to use to train and run the model. GPUs with more RAM typically cost more money, but perform better, while GPUs with less RAM are cheaper but slower. I went with the RTX A4000, a previous-generation NVIDIA GPU. Make sure that you choose an NVIDIA GPU, because the template expects one.
  2. Enter a Pod Name.
  3. Leave the other settings at their defaults and select Deploy On-Demand.
Runpod console pricing and pod summary for an RTX A4000 with the Deploy On-Demand button highlighted
  1. Wait for the pod to initialize. When the pod is ready, a green circle is displayed next to its name
Runpod console breadcrumb showing a running pod named axolotl_fine_tuning

2. Explore the workspace

Now that we have a pod that has Axolotl up and running, let’s access the pod from a terminal on our local machine and see the files and directories in our workspace.

  1. If you have added a public SSH key to your Runpod account, you will see a command that you can copy and paste into a terminal on your local machine to connect to your pod.
Runpod console SSH connection panel showing the SSH command for connecting to a pod
  1. In your terminal, you should see a welcome message from Axolotl:
Terminal showing the Axolotl cloud image welcome banner after connecting to a pod over SSH
  1. Let’s explore the files in our workspace. By default, you should be in /workspace/axolotl. Enter dir to see the files and folders in this directory. There’s a lot of stuff here!
Terminal listing the contents of the Axolotl repository directory
  1. To train a model with Axolotl, we must use a configuration file. Axolotl provides example configuration files for many different models. Enter cd examples and then dir to list them:
Terminal listing model folders in the Axolotl examples directory
  1. Let’s look at the Llama 3 examples. Llama is a series of LLMs by Meta, and Llama 3 is the previous generation. Enter cd llama-3 and then dir to list the example configuration files:
Terminal listing YAML config files in the Axolotl llama-3 examples folder
  1. There are a lot of confusingly named YAML files here, many of them sounding the same. Let’s look specifically at lora-1b.yml.

    LoRA, which stands for Low-Rank Adaptation, is a technique used in fine-tuning that adapts models to new contexts in an efficient and performant way without requiring full retraining of the model. It will speed up our fine-tuning and be less costly than full retraining. The 1b in the filename means that the model has one billion parameters.

  2. Let’s read through the configuration file. Open it up in a text editor:

    nano lora-1b.yml
  3. Notice the first few fields:
Axolotl YAML config snippet setting Llama-3.2-1B as the base model with an Alpaca-format dataset

The model we will fine-tune is NousResearch/Llama-3.2-1B, a Llama 3.2 text model with one billion parameters that even lower-end hardware can run.

We will train it on the teknium/GPT4-LLM-Cleaned dataset, a GPT-4 LLM instruction dataset with OpenAI disclaimers and refusals filtered out.

Press Ctrl+X (Cmd+X on Mac) to exit the Nano text editor.

3. Fine-tune a model

Okay, we familiarized ourselves with the pod’s workspace and chose a configuration file. Now let’s go ahead and fine-tune our model!

  1. Enter cd /workspace/axolotl to navigate back to the starting directory so that our model is saved in the correct place. Now enter the following command:

    axolotl train examples/llama-3/lora-1b.yml
  1. This may take a while depending on the GPU you chose for your pod. Once Axolotl finishes fine-tuning the model, you should see something like the following message:

    Training completed! Saving trained model to ./outputs/lora-out.
  1. Let’s test the model to see if it responds to prompts in a way that aligns with the dataset. Enter the following command to start interactively prompting the model we just fine-tuned. The first argument is the location of the configuration file that we used to train the model; the --lora-model-dir argument is the location of the fine-tuned model:

    axolotl inference examples/llama-3/lora-1b.yml
    --lora-model-dir="./outputs/lora-out" --gradio
  2. Axolotl should generate URLs that you can open in your browser to prompt the model using a UI called Gradio. Open the public URL:
Axolotl Gradio interface showing a health-tips prompt and the fine-tuned model's response
  1. Enter a prompt from the dataset, such as:

    Give three tips for staying healthy.
  2. Check the model’s output against the expected output from the dataset. For example, in my testing, the above prompt resulted in the following output:

    1. Eat nutritious foods and exercise regularly
    2. Get plenty of sleep and avoid stressors like nicotine, alcohol or excess caffeine intake.
    3. Manage your weight by eating a balanced diet and engaging in physical activity at least 30 minutes per day.


    Whereas the expected output from the dataset is:

    1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.
    2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.
    3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.

    It’s close, but not quite the same. This is one of the drawbacks of using LoRA - lower precision. But still pretty good!

Next steps

Congratulations, you’ve fine-tuned a model based on a dataset! Runpod and Axolotl enable you to take existing models and adapt them to new contexts, without requiring you to create your own model from scratch. Here are some things you can do to take this further:

  • As you saw, our fine-tuned model didn’t exactly match our dataset’s expected output. Try fully fine-tuning a model using another of Axolotl’s example configuration files (examples/llama-3/fft-8b.yaml) and check the output against the dataset’s expected output. As this configuration uses an eight billion-parameter model and fully fine-tunes it, the output should be more accurate.
  • Try fine-tuning a model using Quantized Low-Rank Adaptation (QLoRA). QLoRA is similar to LoRA, but quantizes the model, compressing complex, more precise parameters into smaller, less precise parameters. Therefore, fine-tuning with QLoRA is even more efficient than LoRA, but also results in less precise output. Axolotl provides example configuration files that use QLoRA, such as examples/llama-3/qlora.yml, which fine-tunes an eight billion-parameter model. Compare the time it takes to fine-tune a model using full fine-tuning, LoRA, and QLoRA.
  • Runpod also offers an Axolotl serverless template. Try spinning up an endpoint and fine-tuning a model by sending a JSON request.

Note: When you’re done with your pod, don’t forget to terminate it, otherwise it will keep costing you money!

Author profile: Eliot Cowley

Related articles

View All
Deploy When Available is now GA

Deploy When Available is now GA

Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

All
The Chips Got Faster. The Stack Didn't.

The Chips Got Faster. The Stack Didn't.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.

All

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.