We've cooked up a bunch of improvements designed to reduce friction and make the.


Axolotl offers a range of tools for fine-tuning language models (LLMs) with pre-trained weights and support frameworks like Hugging Face Transformers. RunPod is a scalable GPU cloud server provider that provides good environments for running machine learning workloads, which makes it a good option for high-resource LLM fine-tuning tasks. This tutorial will show how to set up Axolotl on RunPod to streamline LLM fine-tuning.
To get the best out of this guide, you need specific resources and technical skills:
When selecting your instance, you should meet your model’s demand and pick the accurate GPU, storage, and RAM. Running a 7B-parameter model on a single A100 with 40GB VRAM might scale but larger models like 13B or above will not scale on that same instance but a multi-GPU instance or an A100 with 80GB VRAM.
There’s an overview of the hourly cost involved in running each instance type on RunPod’s pricing page, and you can choose based on your workload requirements and budget.
If you'd like to skip the setup below, feel free to just deploy this axolotl template by winglian. If you'd rather install it from scratch, you can do that in any Pytorch pod.
Create a virtual environment for your project if you prefer:
You can easily install Axolotl on the terminal from GitHub with the code below:
Axoltol supports data in different formats like CSV, JSON, etc, so structuring the dataset to meet the training, validation, testing, and testing models is important.
You can transfer the dataset to RunPod via SCP or use cloud storage like S3 and SFTP. For example, using SCP to transfer a dataset file:
If you are working with a small dataset, you could easily simply drag and drop it into the pod with Jupyter Notebook, or upload it using runpodctl.
Axolotl uses YAML configuration files. Create a file named config.yml
with the following structure:
You might also look at the /examples/ folder for several premade .yml files that might also suit your needs.
Adjust parameters like base_model
, model_type
, lora_target_modules
, and resources based on your specific model and hardware constraints.
load_in_8bit: true
: Uses 8-bit quantization to reduce VRAM usageadapter: lora
: Uses LoRA adapter for parameter-efficient fine-tuninglora_r
, lora_alpha
: Controls the rank and scaling of LoRA adaptersmicro_batch_size
: Size of each training batchgradient_accumulation_steps
: Accumulates gradients before updating weightsStart the fine-tuning process with:
For multi-GPU training with DeepSpeed:
Monitor training progress directly in the terminal output. For more detailed monitoring:
wandb_project
in your config, you can monitor training metrics in real-time at wandb.ai.--logdir ./output/tensorboard
-n 1 nvidia-smi
Evaluate your model with Axolotl's built-in evaluation:
To maximize the efficiency and minimize the costs on RunPod:
For hyperparameter tuning, experiment with different learning rates, LoRA configurations, and batch sizes while monitoring the model's performance.
The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.