Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Why bring your own model?

Before diving into the setup, it's worth understanding why you'd want to do this in the first place. There are four compelling reasons:

Cost. Ten dollars goes significantly further when you're self-hosting. In this guide, we use a 20B coding model quantized to 4-bit, which runs comfortably on an A4500 at just $0.25/hour — giving you nearly 40 hours of unlimited use for what you might spend in an hour or two with a larger Claude model if you're not careful.

Right-sizing your model to the task. If you're generating boilerplate Python scripts or simple utilities, you don't need Opus — or even Haiku. Practically any competent coding model can one-shot those tasks. Paying per-token rates for a frontier model on simple work is overkill, and self-hosting lets you tune your spend to match the complexity of what you're building.

Compliance and security. If your work involves trade secrets, sensitive data, or specific security requirements around tool calling and OS-level access, large hosted foundational models may not meet your needs. When you bring your own model, you're connecting Claude Code to an LLM engine under your direct control — one you can inspect, configure, and extend as needed.

Domain-specific fine-tuning. You can swap in models fine-tuned for specific domains: a model trained heavily on Python, one optimized for data science, or any other specialized variant. This matters especially with smaller models, which benefit greatly from fine-tuning since they lack the broad general knowledge of larger frontier models.

What you'll need

A Runpod account
Two pods: one to run Ollama (the inference server), one to run Claude Code that will serve as your dev environment. This could potentially be consolidated into a single pod if you prefer, but we'll demonstrate this with two to keep things compartmentalized.
No Anthropic API key is required if you don't have an active Claude Pro or Console account

Step 1: Set up your Ollama Pod

Scroll to the A4500 GPU (currently around $0.25/hour) and select the Ollama template. Give the container a bit of extra disk space in case you need it, then deploy.

While the pod boots, think about your model selection. This is important: if you want Claude Code's full tool-calling capabilities — where it edits files autonomously and takes real actions in your codebase — you need a model that explicitly supports tool calling. Not every open-source model does. For this guide, we're using a fine-tuned version of GPT-OSS-20B that has been adapted specifically for tool calling.

Once your pod is running, connect to it via the terminal and pull your model::

ollama run slekrem/gpt-oss-claude-code-32k

You can then test it with a quick 'hello world" in the terminal.

Step 2: Set up your Claude Code Pod

Spin up a second pod — an A6000 running the latest PyTorch template works well. This is the pod where you'll install and run Claude Code.

Install Claude Code the same way you would normally, then install a terminal text editor:

apt-get update && apt-get install nano

Step 3: Configure Claude Code to use your Ollama Pod

Claude Code needs to know where to send its requests. Navigate to the Claude configuration directory and open settings.json:

cd ~/.claude
nano settings.json

Add the environment variables that point Claude Code at your Ollama pod. You'll need your Ollama pod's ID from the RunPod dashboard — paste it into the appropriate field in the config. The full settings snippet is available in the video description.

If you don't have an active Anthropic account, you'll need to bypass the authentication screen. Create a small shell script that returns a dummy API key:

# Create api_key_helper.sh
echo '#!/bin/bash'  >> api_key_helper.sh
echo 'echo "dummy-key"' >> api_key_helper.sh
chmod +x api_key_helper.sh

Then reference this script in your settings.json under the apiKeyHelper field with the path to the file. When you launch Claude Code, it will skip the login screen entirely and connect directly to your Ollama pod.

Here's an example settings.json that you can use:


{
  "apiKeyHelper": "/root./claude/api-key-helper.sh", 
  "env": {
    "ANTHROPIC_BASE_URL": "https://yourpodidgoeshere-11434.proxy.runpod.net",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_API_KEY": "",
    "ANTHROPIC_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "slekrem/gpt-oss-claude-code-32k:20b"
  }
}

Step 4: Verify the connection

Launch Claude Code from your workspace directory and ask it a simple question:

Which model am I speaking to?

If everything is configured correctly, you'll see the model identify itself as your Ollama-hosted model — not Claude. You're now routing entirely through your own infrastructure.

Real-world performance: What to expect

We ran a few tests to see how a small quantized model holds up for real coding tasks.

Snake game — Asked the model to build a terminal-based Snake game with arrow key controls, apple collection, and score tracking. It one-shot the working game on the first attempt. Impressive for a 4-bit quantized 20B model.

Tetris — Same story. The model one-shotted a terminal Tetris game. When we added a follow-up request for rotation controls and better speed, it integrated those changes cleanly in a second pass.

Web search — The model correctly flagged that it doesn't have native web browsing capability. However, when given a direct URL, it was able to fetch and summarize the page — a useful workaround for targeted lookups even without a true search integration.

Open-ended architecture questions — This is where the limits showed. When asked to "choose the best framework for a REST API" with no additional context, the model got stuck — spending several minutes searching an empty codebase before eventually stalling out. Small models need more direction. They don't carry the same planning and reasoning depth as frontier models, so vague or open-ended prompts tend to produce poor results.

	Hosted Claude Models	Self-Hosted via RunPod
Cost	Higher per-token rates	Pennies per hour
Setup	Zero config	Moderate setup
Tool calling	Full support	Depends on model
Direction needed	Handles ambiguity well	Needs specific prompts
Customization	Limited	Full control
Compliance	Shared infrastructure	Your infrastructure

‍

The bottom line: for well-defined coding tasks — generating scripts, building small applications, writing boilerplate — a self-hosted model on RunPod can match or exceed what you'd need from a hosted model at a tiny fraction of the cost. For complex, multi-step reasoning or ambiguous architecture decisions, you may still want to reach for a larger model.

The key to success with smaller models is the same best practice that applies to AI coding assistants generally: be specific. Break work into small, concrete tasks. The more granular your prompt, the better your results — regardless of which model you're using.

Get started

Ready to try it yourself? You'll need:

Runpod — spin up your Ollama and Claude Code pods
A tool-calling compatible model from the Ollama library
The settings snippet from the video description to wire everything together

If you need further help, check out our Youtube video on the topic:
‍

If you build something cool with this setup, drop it in the comments on the video or let us know in the Discord. Happy building!

‍

Author

Brendan McKeag

Date

February 18, 2026

Table of contents

TOC

Get started

Use Claude Code with your own model on Runpod: No Anthropic account required

Why bring your own model?

Before diving into the setup, it's worth understanding why you'd want to do this in the first place. There are four compelling reasons:

What you'll need

A Runpod account
Two pods: one to run Ollama (the inference server), one to run Claude Code that will serve as your dev environment. This could potentially be consolidated into a single pod if you prefer, but we'll demonstrate this with two to keep things compartmentalized.
No Anthropic API key is required if you don't have an active Claude Pro or Console account

Step 1: Set up your Ollama Pod

Scroll to the A4500 GPU (currently around $0.25/hour) and select the Ollama template. Give the container a bit of extra disk space in case you need it, then deploy.

Once your pod is running, connect to it via the terminal and pull your model::

ollama run slekrem/gpt-oss-claude-code-32k

You can then test it with a quick 'hello world" in the terminal.

Step 2: Set up your Claude Code Pod

Spin up a second pod — an A6000 running the latest PyTorch template works well. This is the pod where you'll install and run Claude Code.

Install Claude Code the same way you would normally, then install a terminal text editor:

apt-get update && apt-get install nano

Step 3: Configure Claude Code to use your Ollama Pod

Claude Code needs to know where to send its requests. Navigate to the Claude configuration directory and open settings.json:

cd ~/.claude
nano settings.json

If you don't have an active Anthropic account, you'll need to bypass the authentication screen. Create a small shell script that returns a dummy API key:

# Create api_key_helper.sh
echo '#!/bin/bash'  >> api_key_helper.sh
echo 'echo "dummy-key"' >> api_key_helper.sh
chmod +x api_key_helper.sh

Here's an example settings.json that you can use:


{
  "apiKeyHelper": "/root./claude/api-key-helper.sh", 
  "env": {
    "ANTHROPIC_BASE_URL": "https://yourpodidgoeshere-11434.proxy.runpod.net",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_API_KEY": "",
    "ANTHROPIC_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "slekrem/gpt-oss-claude-code-32k:20b"
  }
}

Step 4: Verify the connection

Launch Claude Code from your workspace directory and ask it a simple question:

Which model am I speaking to?

If everything is configured correctly, you'll see the model identify itself as your Ollama-hosted model — not Claude. You're now routing entirely through your own infrastructure.

Real-world performance: What to expect

We ran a few tests to see how a small quantized model holds up for real coding tasks.

	Hosted Claude Models	Self-Hosted via RunPod
Cost	Higher per-token rates	Pennies per hour
Setup	Zero config	Moderate setup
Tool calling	Full support	Depends on model
Direction needed	Handles ambiguity well	Needs specific prompts
Customization	Limited	Full control
Compliance	Shared infrastructure	Your infrastructure

‍

Get started

Ready to try it yourself? You'll need:

Runpod — spin up your Ollama and Claude Code pods
A tool-calling compatible model from the Ollama library
The settings snippet from the video description to wire everything together

If you need further help, check out our Youtube video on the topic:
‍

If you build something cool with this setup, drop it in the comments on the video or let us know in the Discord. Happy building!

‍

Use Claude Code with your own model on Runpod: No Anthropic account required

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Why bring your own model?

What you'll need

Step 1: Set up your Ollama Pod

Step 2: Set up your Claude Code Pod

Step 3: Configure Claude Code to use your Ollama Pod

Step 4: Verify the connection

Real-world performance: What to expect

Get started

Use Claude Code with your own model on Runpod: No Anthropic account required

Why bring your own model?

What you'll need

Step 1: Set up your Ollama Pod

Step 2: Set up your Claude Code Pod

Step 3: Configure Claude Code to use your Ollama Pod

Step 4: Verify the connection

Real-world performance: What to expect

Get started

How to Create a Custom API on RunPod Serverless

DeepSeek R1: What It Is and Why It Matters

RunPod Achieves SOC 2 Type I Certification: A Milestone in AI Security

Build what’s next.

Use Claude Code with your own model on Runpod: No Anthropic account required

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Why bring your own model?

What you'll need

Step 1: Set up your Ollama Pod

Step 2: Set up your Claude Code Pod

Step 3: Configure Claude Code to use your Ollama Pod

Step 4: Verify the connection

Real-world performance: What to expect

Get started

Use Claude Code with your own model on Runpod: No Anthropic account required

Why bring your own model?

What you'll need

Step 1: Set up your Ollama Pod

Step 2: Set up your Claude Code Pod

Step 3: Configure Claude Code to use your Ollama Pod

Step 4: Verify the connection

Real-world performance: What to expect

Get started

Related articles.

How to Create a Custom API on RunPod Serverless

DeepSeek R1: What It Is and Why It Matters

RunPod Achieves SOC 2 Type I Certification: A Milestone in AI Security

Build what’s next.

You’ve unlocked areferral bonus!

You’ve unlocked a
referral bonus!