Back
News
February 18th, 2026
10
minute read

Use Claude Code with your own model on Runpod: No Anthropic account required

Brendan McKeag
Brendan McKeag

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Why bring your own model?

Before diving into the setup, it's worth understanding why you'd want to do this in the first place. There are four compelling reasons:

Cost. Ten dollars goes significantly further when you're self-hosting. In this guide, we use a 20B coding model quantized to 4-bit, which runs comfortably on an A4500 at just $0.25/hour — giving you nearly 40 hours of unlimited use for what you might spend in an hour or two with a larger Claude model if you're not careful.

Right-sizing your model to the task. If you're generating boilerplate Python scripts or simple utilities, you don't need Opus — or even Haiku. Practically any competent coding model can one-shot those tasks. Paying per-token rates for a frontier model on simple work is overkill, and self-hosting lets you tune your spend to match the complexity of what you're building.

Compliance and security. If your work involves trade secrets, sensitive data, or specific security requirements around tool calling and OS-level access, large hosted foundational models may not meet your needs. When you bring your own model, you're connecting Claude Code to an LLM engine under your direct control — one you can inspect, configure, and extend as needed.

Domain-specific fine-tuning. You can swap in models fine-tuned for specific domains: a model trained heavily on Python, one optimized for data science, or any other specialized variant. This matters especially with smaller models, which benefit greatly from fine-tuning since they lack the broad general knowledge of larger frontier models.

What you'll need

  • A Runpod account
  • Two pods: one to run Ollama (the inference server), one to run Claude Code that will serve as your dev environment. This could potentially be consolidated into a single pod if you prefer, but we'll demonstrate this with two to keep things compartmentalized.
  • No Anthropic API key is required if you don't have an active Claude Pro or Console account

Step 1: Set up your Ollama Pod

Scroll to the A4500 GPU (currently around $0.25/hour) and select the Ollama template. Give the container a bit of extra disk space in case you need it, then deploy.

While the pod boots, think about your model selection. This is important: if you want Claude Code's full tool-calling capabilities — where it edits files autonomously and takes real actions in your codebase — you need a model that explicitly supports tool calling. Not every open-source model does. For this guide, we're using a fine-tuned version of GPT-OSS-20B that has been adapted specifically for tool calling.

Once your pod is running, connect to it via the terminal and pull your model::

ollama run slekrem/gpt-oss-claude-code-32k

You can then test it with a quick 'hello world" in the terminal.

Step 2: Set up your Claude Code Pod

Spin up a second pod — an A6000 running the latest PyTorch template works well. This is the pod where you'll install and run Claude Code.

Install Claude Code the same way you would normally, then install a terminal text editor:

apt-get update && apt-get install nano

Step 3: Configure Claude Code to use your Ollama Pod

Claude Code needs to know where to send its requests. Navigate to the Claude configuration directory and open settings.json:

cd ~/.claude
nano settings.json

Add the environment variables that point Claude Code at your Ollama pod. You'll need your Ollama pod's ID from the RunPod dashboard — paste it into the appropriate field in the config. The full settings snippet is available in the video description.

If you don't have an active Anthropic account, you'll need to bypass the authentication screen. Create a small shell script that returns a dummy API key:

# Create api_key_helper.sh
echo '#!/bin/bash'  >> api_key_helper.sh
echo 'echo "dummy-key"' >> api_key_helper.sh
chmod +x api_key_helper.sh

Then reference this script in your settings.json under the apiKeyHelper field with the path to the file. When you launch Claude Code, it will skip the login screen entirely and connect directly to your Ollama pod.

Here's an example settings.json that you can use:


{
  "apiKeyHelper": "/root./claude/api-key-helper.sh", 
  "env": {
    "ANTHROPIC_BASE_URL": "https://yourpodidgoeshere-11434.proxy.runpod.net",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_API_KEY": "",
    "ANTHROPIC_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "slekrem/gpt-oss-claude-code-32k:20b"
  }
}

Step 4: Verify the connection

Launch Claude Code from your workspace directory and ask it a simple question:

Which model am I speaking to?

If everything is configured correctly, you'll see the model identify itself as your Ollama-hosted model — not Claude. You're now routing entirely through your own infrastructure.

Real-world performance: What to expect

We ran a few tests to see how a small quantized model holds up for real coding tasks.

Snake game — Asked the model to build a terminal-based Snake game with arrow key controls, apple collection, and score tracking. It one-shot the working game on the first attempt. Impressive for a 4-bit quantized 20B model.

Tetris — Same story. The model one-shotted a terminal Tetris game. When we added a follow-up request for rotation controls and better speed, it integrated those changes cleanly in a second pass.

Web search — The model correctly flagged that it doesn't have native web browsing capability. However, when given a direct URL, it was able to fetch and summarize the page — a useful workaround for targeted lookups even without a true search integration.

Open-ended architecture questions — This is where the limits showed. When asked to "choose the best framework for a REST API" with no additional context, the model got stuck — spending several minutes searching an empty codebase before eventually stalling out. Small models need more direction. They don't carry the same planning and reasoning depth as frontier models, so vague or open-ended prompts tend to produce poor results.

Hosted Claude Models Self-Hosted via RunPod
Cost Higher per-token rates Pennies per hour
Setup Zero config Moderate setup
Tool calling Full support Depends on model
Direction needed Handles ambiguity well Needs specific prompts
Customization Limited Full control
Compliance Shared infrastructure Your infrastructure

The bottom line: for well-defined coding tasks — generating scripts, building small applications, writing boilerplate — a self-hosted model on RunPod can match or exceed what you'd need from a hosted model at a tiny fraction of the cost. For complex, multi-step reasoning or ambiguous architecture decisions, you may still want to reach for a larger model.

The key to success with smaller models is the same best practice that applies to AI coding assistants generally: be specific. Break work into small, concrete tasks. The more granular your prompt, the better your results — regardless of which model you're using.

Get started

Ready to try it yourself? You'll need:

  • Runpod — spin up your Ollama and Claude Code pods
  • A tool-calling compatible model from the Ollama library
  • The settings snippet from the video description to wire everything together

If you need further help, check out our Youtube video on the topic:

If you build something cool with this setup, drop it in the comments on the video or let us know in the Discord. Happy building!

Newly  Features

We've cooked up a bunch of improvements designed to reduce friction and make the.

Create ->
Newly  Features

We've cooked up a bunch of improvements designed to reduce friction and make the.

Create ->
Newly  Features

We've cooked up a bunch of improvements designed to reduce friction and make the.

Create ->
Newly  Features

We've cooked up a bunch of improvements designed to reduce friction and make the.

Create ->
We're officially HIPAA & GDPR compliant
You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500
You've unlocked a referral bonus!
Claim Your Bonus
Claim Bonus
Blog

Use Claude Code with your own model on Runpod: No Anthropic account required

If you've been using Claude Code with Anthropic's hosted models, you already know how powerful it is for AI-assisted development. But what if you could run the same workflow for a fraction of the cost, with complete control over the underlying model? In this guide, we'll walk you through connecting Claude Code to a self-hosted model running on Runpod using Ollama — no Anthropic API key required.

Author
Brendan McKeag
Date
February 18, 2026
Table of contents
Share
Use Claude Code with your own model on Runpod: No Anthropic account required

Why bring your own model?

Before diving into the setup, it's worth understanding why you'd want to do this in the first place. There are four compelling reasons:

Cost. Ten dollars goes significantly further when you're self-hosting. In this guide, we use a 20B coding model quantized to 4-bit, which runs comfortably on an A4500 at just $0.25/hour — giving you nearly 40 hours of unlimited use for what you might spend in an hour or two with a larger Claude model if you're not careful.

Right-sizing your model to the task. If you're generating boilerplate Python scripts or simple utilities, you don't need Opus — or even Haiku. Practically any competent coding model can one-shot those tasks. Paying per-token rates for a frontier model on simple work is overkill, and self-hosting lets you tune your spend to match the complexity of what you're building.

Compliance and security. If your work involves trade secrets, sensitive data, or specific security requirements around tool calling and OS-level access, large hosted foundational models may not meet your needs. When you bring your own model, you're connecting Claude Code to an LLM engine under your direct control — one you can inspect, configure, and extend as needed.

Domain-specific fine-tuning. You can swap in models fine-tuned for specific domains: a model trained heavily on Python, one optimized for data science, or any other specialized variant. This matters especially with smaller models, which benefit greatly from fine-tuning since they lack the broad general knowledge of larger frontier models.

What you'll need

  • A Runpod account
  • Two pods: one to run Ollama (the inference server), one to run Claude Code that will serve as your dev environment. This could potentially be consolidated into a single pod if you prefer, but we'll demonstrate this with two to keep things compartmentalized.
  • No Anthropic API key is required if you don't have an active Claude Pro or Console account

Step 1: Set up your Ollama Pod

Scroll to the A4500 GPU (currently around $0.25/hour) and select the Ollama template. Give the container a bit of extra disk space in case you need it, then deploy.

While the pod boots, think about your model selection. This is important: if you want Claude Code's full tool-calling capabilities — where it edits files autonomously and takes real actions in your codebase — you need a model that explicitly supports tool calling. Not every open-source model does. For this guide, we're using a fine-tuned version of GPT-OSS-20B that has been adapted specifically for tool calling.

Once your pod is running, connect to it via the terminal and pull your model::

ollama run slekrem/gpt-oss-claude-code-32k

You can then test it with a quick 'hello world" in the terminal.

Step 2: Set up your Claude Code Pod

Spin up a second pod — an A6000 running the latest PyTorch template works well. This is the pod where you'll install and run Claude Code.

Install Claude Code the same way you would normally, then install a terminal text editor:

apt-get update && apt-get install nano

Step 3: Configure Claude Code to use your Ollama Pod

Claude Code needs to know where to send its requests. Navigate to the Claude configuration directory and open settings.json:

cd ~/.claude
nano settings.json

Add the environment variables that point Claude Code at your Ollama pod. You'll need your Ollama pod's ID from the RunPod dashboard — paste it into the appropriate field in the config. The full settings snippet is available in the video description.

If you don't have an active Anthropic account, you'll need to bypass the authentication screen. Create a small shell script that returns a dummy API key:

# Create api_key_helper.sh
echo '#!/bin/bash'  >> api_key_helper.sh
echo 'echo "dummy-key"' >> api_key_helper.sh
chmod +x api_key_helper.sh

Then reference this script in your settings.json under the apiKeyHelper field with the path to the file. When you launch Claude Code, it will skip the login screen entirely and connect directly to your Ollama pod.

Here's an example settings.json that you can use:


{
  "apiKeyHelper": "/root./claude/api-key-helper.sh", 
  "env": {
    "ANTHROPIC_BASE_URL": "https://yourpodidgoeshere-11434.proxy.runpod.net",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_API_KEY": "",
    "ANTHROPIC_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "slekrem/gpt-oss-claude-code-32k:20b",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "slekrem/gpt-oss-claude-code-32k:20b"
  }
}

Step 4: Verify the connection

Launch Claude Code from your workspace directory and ask it a simple question:

Which model am I speaking to?

If everything is configured correctly, you'll see the model identify itself as your Ollama-hosted model — not Claude. You're now routing entirely through your own infrastructure.

Real-world performance: What to expect

We ran a few tests to see how a small quantized model holds up for real coding tasks.

Snake game — Asked the model to build a terminal-based Snake game with arrow key controls, apple collection, and score tracking. It one-shot the working game on the first attempt. Impressive for a 4-bit quantized 20B model.

Tetris — Same story. The model one-shotted a terminal Tetris game. When we added a follow-up request for rotation controls and better speed, it integrated those changes cleanly in a second pass.

Web search — The model correctly flagged that it doesn't have native web browsing capability. However, when given a direct URL, it was able to fetch and summarize the page — a useful workaround for targeted lookups even without a true search integration.

Open-ended architecture questions — This is where the limits showed. When asked to "choose the best framework for a REST API" with no additional context, the model got stuck — spending several minutes searching an empty codebase before eventually stalling out. Small models need more direction. They don't carry the same planning and reasoning depth as frontier models, so vague or open-ended prompts tend to produce poor results.

Hosted Claude Models Self-Hosted via RunPod
Cost Higher per-token rates Pennies per hour
Setup Zero config Moderate setup
Tool calling Full support Depends on model
Direction needed Handles ambiguity well Needs specific prompts
Customization Limited Full control
Compliance Shared infrastructure Your infrastructure

The bottom line: for well-defined coding tasks — generating scripts, building small applications, writing boilerplate — a self-hosted model on RunPod can match or exceed what you'd need from a hosted model at a tiny fraction of the cost. For complex, multi-step reasoning or ambiguous architecture decisions, you may still want to reach for a larger model.

The key to success with smaller models is the same best practice that applies to AI coding assistants generally: be specific. Break work into small, concrete tasks. The more granular your prompt, the better your results — regardless of which model you're using.

Get started

Ready to try it yourself? You'll need:

  • Runpod — spin up your Ollama and Claude Code pods
  • A tool-calling compatible model from the Ollama library
  • The settings snippet from the video description to wire everything together

If you need further help, check out our Youtube video on the topic:

If you build something cool with this setup, drop it in the comments on the video or let us know in the Discord. Happy building!

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

You’ve unlocked a
referral bonus!

Sign up today and you’ll get a random credit bonus between $5 and $500 when you spend your first $10 on Runpod.