Serverless GPUs

GPU compute for AI Inference and Training.

Pay by the second.
Autoscale
Bring Your Container
Logs & Metrics
Multi-GPU
Network Storage
Streaming
Webhooks
3,964,561,456
requests
FlashBoot

P70 Cold-Start
P90
StableDiffusion

258ms
355ms
Whisper

254ms
268ms
Flex
Active
40% Discount
16 GB
A4000
$0.00020
$0.00012
24 GB
A5000
$0.00026
$0.00016
24 GB
PRO
4090
$0.00044
$0.00026
48 GB
A6000
$0.00048
$0.00029
48 GB
PRO
L40
$0.00069
$0.00041
80 GB
A100
$0.00130
$0.00078
80 GB
PRO
H100
$0.00250
$0.00150
flex with ease

cold-start in milliseconds
Starter
Enterprise
Workers
5 - 30
100+
Support
Community
Bandwidth
FREE
Logs
30 days
Metrics
90 days
Network Storage
$0.07/GB/mo
> 1TB $0.05/GB/mo
Regions
US / Europe
Uptime
Serverless Pricing Calculator
seconds
* GPU type will impact execution time.

$ 1,189.44
/mo
to handle 720,000 requests per month
Book a Call for Enterprise Support
Cost estimation includes:
  • 20% of the requests using reserved price
  • 20% of the requests running into 5s cold-start
  • Input
    Your Code
    Output
    AI Inference
    We handle millions of inference requests a day. Scale your machine learning inference while keeping costs low.


    AI Training
    Run machine learning training tasks that can take up to 7 days. Spin up GPUs per request and scale down once done.
    Autoscale
    Workers scale from 0 to n on our Secure Cloud with 8+ regions distributed globally.

    Requests
    Bring Your Container
    Deploy any container, public and private image repositories are supported. Configure your environment the way you want.

    Workers
    Cold-Start
    < 500ms
    With FlashBoot, we are able to reduce P70 (70% of cold-starts) to less than 500ms and P90 (90% of cold-starts) of all serverless endpoints including LLMs to less than a second.
    FlashBoot is our optimization layer to manage deployment, tear-down, and scaleup activities in real-time.

    P70 < 500ms
    😍
    Metrics
    Seamlessly debug containers with access to GPU, CPU, Memory, and other metrics.

    Worker Utilization
    CPU
    0%
    Mem
    30%
    GPU Util
    0%
    GPU Mem
    0%
    Network Storage
    Serverless workers can access network storage volume backed by NVMe SSD with up to 100Gbps network throughput. 100TB+ storage size is supported, contact us if you need 1PB+.
    Ever wondered how you can store 1000s of fine-tuned models and access them on the fly? Now it's easy!



    /runpod-volume
    -- /data
    -- /models
    -- /output


    Logs / SSH
    Full debugging capabilities for your workers through logs and SSH. Web terminal is available for even easier access.
    Webhooks
    Leverage webhooks to get data output as soon as request is done. Data is pushed instantly to your API.
    Contact us for individual use cases and we can help you get ready for production.