.jpeg)
Deploy When Available is now GA
Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.
Blog
Discover how NVIDIA A40 GPUs on Runpod offer unmatched value for machine learning, high performance, low cost, and excellent availability for fine-tuning LLMs.

In the rapidly evolving world of artificial intelligence and machine learning, the need for powerful, cost-effective hardware has never been more critical.
The launch of the A40 GPUs marks a significant milestone in this journey, offering unparalleled performance and affordability.
These GPUs are designed to cater to the needs of professionals and organizations looking to scale their machine learning projects without breaking the bank. Discover how A40s can transform your machine learning workflows.
The A40 GPUs stand out not just for their technical prowess but also for their ability to democratize access to advanced machine learning capabilities. These GPUs are equipped with 48 GB of VRAM, supporting intensive computation tasks without compromising on speed or efficiency.
The following benchmarks demonstrate how the A40s stack up against the H100s.
| GPU Models | Number of GPU | AI Models | Throughput (Tokens/s) | Price ($/1M Tokens) |
|---|---|---|---|---|
| H100 PCIe 80GB | 1 | LLama-2-13B | 1253 | $0.86 |
| H100 PCIe 80GB | 2 | LLama-2-13B | 1829.18 | $1.18 |
| H100 PCIe 80GB | 4 | LLama-2-13B | 2083 | $2.07 |
| H100 PCIe 80GB | 8 | LLama-2-13B | 2125.74 | $4.07 |
| A40 PCIe 48GB | 1 | LLama-2-13B | 283.36 | $0.77 |
| A40 PCIe 48GB | 2 | LLama-2-13B | 773.76 | $0.57 |
| A40 PCIe 48GB | 4 | LLama-2-13B | 1360.29 | $0.65 |
| A40 PCIe 48GB | 8 | LLama-2-13B | 1480.8 | $1.19 |
| GPU Models | Number of GPU | AI Models | Throughput (Tokens/s) | Price ($/1M Tokens) |
|---|---|---|---|---|
| H100 PCIe 80GB | 1 | Mistral-7B | 3053 | $0.35 |
| H100 PCIe 80GB | 2 | Mistral-7B | 2983.58 | $0.72 |
| H100 PCIe 80GB | 4 | Mistral-7B | 3118.42 | $1.39 |
| H100 PCIe 80GB | 8 | Mistral-7B | 3214.49 | $2.69 |
| A40 PCIe 48GB | 1 | Mistral-7B | 1538.89 | $0.14 |
| A40 PCIe 48GB | 2 | Mistral-7B | 1991.86 | $0.22 |
| A40 PCIe 48GB | 4 | Mistral-7B | 2399.12 | $0.37 |
| A40 PCIe 48GB | 8 | Mistral-7B | 2431.8 | $0.72 |
Setting up and utilizing the A40 GPUs is a straightforward process designed to integrate seamlessly into your existing workflow.
For Pods:
Select the A40, when deploying your Pod.
For more informaiton, see the Pod documentation.
For Serverless:
Select the GPU Instance, like 48 GB GPU, then select A40 as the GPU type.
For more informaiton, see the Serverless documentation.
The following table presents a comparison of different AI models, highlighting their rank, GPU configurations, number of GPUs used, and their respective prices per million tokens to help users identify the most cost-effective options for their needs.
| AI Model | Rank | Configuration | Price ($/1M Tokens) |
|---|---|---|---|
| Mistral-7B | 1 | A40 PCIe 48GB (1 GPU) | $0.14 |
| Mistral-7B | 2 | A40 PCIe 48GB (2 GPU) | $0.22 |
| Mistral-7B | 3 | H100 PCIe 80GB (1 GPU) | $0.35 |
| LLama-2-13B | 1 | A40 PCIe 48GB (2 GPU) | $0.57 |
| LLama-2-13B | 2 | A40 PCIe 48GB (4 GPU) | $0.65 |
| LLama-2-13B | 3 | A40 PCIe 48GB (1 GPU) | $0.77 |
The A40 GPUs are not just hardware; they are gateways to advancing your machine learning projects with efficiency and affordability. By choosing these GPUs, you're equipped to tackle the most demanding tasks in AI without compromising on performance or cost.
Explore further by attending a dedicated webinar, visiting the official product page for detailed specifications, or reading case studies to see these GPUs in action.
Embark on your journey with the A40 GPUs and redefine what's possible in machine learning.
Author profile: Brendan McKeag
Blog Posts
.jpeg)
Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.
.jpeg)
With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.