Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

We're officially SOC 2 Type II Compliant

You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500

You've unlocked a referral bonus!

Claim Your Bonus

Claim Bonus

Blog

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Learn how to deploy Meta’s powerful open-source Llama 3.1 405B model using Ollama on RunPod. With benchmark-crushing performance, this guide walks you through setup and deployment.

Author

Shaamil Karim

Date

July 29, 2024

Table of contents

TOC

Share

Get started

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K) and code generation (89.0 on the HumanEval benchmark), Llama 3.1 is a game-changer.

Follow this guide to lean how to deploy the model on RunPod using Ollama, a powerful and user-friendly platform for running LLMs. Plus, we’ll show you how to test it in a ChatGPT-like WebUI chat interface with just one Docker command.

Why Use Llama 3.1 405B?

Llama 3.1 is groundbreaking for three main reasons:

Exceptional Performance: With 405 billion parameters, it outperforms most models, including GPT-4o, in crucial benchmarks like math and multilingual tasks.

Customizable: Offers an open-source alternative with top-tier capabilities, providing enhanced accessibility and customization for unique use cases.
Cost-effective: Running your own model on services like RunPod can be much cheaper than many large closed-model APIs.

For more details on Llama 3.1, check out Meta’s blog.

Step-by-Step Guide to Deploy Llama 3.1 405B on RunPod

Prerequisites

1) Create your RunPod account and add at least $10 to rent your GPU.

Create RunPod Account

2) Install docker.

1. Create and Configure your GPU Pod

1) Head to Pods and click Deploy.

2) Select H100 PCIe and choose 3 GPUs to provide 240GB of VRAM (80GB each). The Llama 3.1 405B model is 4-bit quantized, so we need at least 240GB in VRAM. For more details, check our blog on picking the right VRAM.

3) Slide the GPU count to 3.

4) Click Change Template to "Better Ollama CUDA 12".

5) Click Edit Template and edit the Container Disk and set it to 250 GB to account for storing the model.

6) Click Set Overrides and Deploy.

7) Find your pod and click Connect.

8) Copy your SSH command.

2. Download Ollama and Llama 3.1 405B

1) Open your terminal and run the SSH command copied above.

2) Once you’re connected via SSH, run this command in your terminal:

This command fetches the Ollama installation script and executes it, setting up Ollama on your Pod. The ollama serve code starts the Ollama server and initializes it for serving AI models.

3) Download the Llama 3.1 405B model (head up, it may take a while):

Start chatting with your model from the terminal. Let’s make it more interactive with a WebUI.

3. Running Llama 3.1 405B with Open WebUI’s chat interface

1) Open a new terminal window.

2) Run the following command, replacing {POD-ID} with your pod ID:‍ example:

3) Once the above is done, go to http://localhost:3000/ and sign up for Open WebUI.

4) Click Select a model and choose the model we downloaded.

Done! You now have a chat interface to chat with your Llama 405b model using Ollama on RunPod.

Troubleshooting

If the docker command doesn’t run, make sure the desktop app is up and running. You can also track your docker container and the logs from the Containers list.
If you’re having issues with the Open WebUI interface, make sure you can chat with the model through the terminal like in Step 2.3 to isolate the issue. Check out Open WebUI’s docs for more help or leave a comment on this blog.

If you’re still facing issues, comment below on this blog for help, or follow Runpod’s docs or Open WebUI’s docs.

Conclusion

To recap, you first get your Pod configured on RunPod, SSH into your server through your terminal, download Ollama and run the Llama 3.1 405b model through the SSH terminal, and run your docker command to start the chat interface on a separate terminal tab.

You now have a taste for the speed and power of running the Llama 3.1 405B model with Ollama on RunPod. By leveraging RunPod’s scalable GPU resources and Ollama’s efficient deployment tools, you can harness the full potential of this cutting-edge model for your projects. Whether you are fine-tuning, conducting research, or developing applications, this setup provides the performance and accessibility needed to push the boundaries of what is possible with AI. Check out our blog on Fine-tuning vs RAG to decide the right option to customize your setup.

Sign up for our RunPod blog for more tutorials and informational content on cutting-edge developments in AI. Add a comment on what you’d like to see next in our blogs!

Deploy Llama3.1 405B on RunPod

‍

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

Get started ->

Request a demo

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Why Use Llama 3.1 405B?

Step-by-Step Guide to Deploy Llama 3.1 405B on RunPod

Prerequisites

1. Create and Configure your GPU Pod

2. Download Ollama and Llama 3.1 405B

3. Running Llama 3.1 405B with Open WebUI’s chat interface

Troubleshooting

Conclusion

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Why Use Llama 3.1 405B?

Step-by-Step Guide to Deploy Llama 3.1 405B on RunPod

Prerequisites

1. Create and Configure your GPU Pod

2. Download Ollama and Llama 3.1 405B

3. Running Llama 3.1 405B with Open WebUI’s chat interface

Troubleshooting

Conclusion

RunPod Partners With OpenCV to Empower the Next Gen of AI Builders

Open Source Video & LLM Roundup: The Best of What’s New

16k Context LLM Models Now Available On Runpod

Build what’s next.

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Why Use Llama 3.1 405B?

Step-by-Step Guide to Deploy Llama 3.1 405B on RunPod

Prerequisites

1. Create and Configure your GPU Pod

2. Download Ollama and Llama 3.1 405B

3. Running Llama 3.1 405B with Open WebUI’s chat interface

Troubleshooting

Conclusion

Run Llama 3.1 405B with Ollama on RunPod: Step-by-Step Deployment

Why Use Llama 3.1 405B?

Step-by-Step Guide to Deploy Llama 3.1 405B on RunPod

Prerequisites

1. Create and Configure your GPU Pod

2. Download Ollama and Llama 3.1 405B

3. Running Llama 3.1 405B with Open WebUI’s chat interface

Troubleshooting

Conclusion

Related articles.

RunPod Partners With OpenCV to Empower the Next Gen of AI Builders

Open Source Video & LLM Roundup: The Best of What’s New

16k Context LLM Models Now Available On Runpod

Build what’s next.

You’ve unlocked areferral bonus!

You’ve unlocked a
referral bonus!