Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Requests within 1hr for Whisper endpoint

live data from our Whisper endpoint — Requests within 1hr for Whisper endpoint

We're officially SOC 2 Type II Compliant

You've unlocked a referral bonus! Sign up today and you'll get a random credit bonus between $5 and $500

You've unlocked a referral bonus!

Claim Your Bonus

Claim Bonus

Blog

Introducing FlashBoot: 1-Second Serverless Cold-Start

Runpod’s new FlashBoot technology slashes cold-start times for serverless GPU endpoints, delivering speeds as low as 500ms. Available now at no extra cost, FlashBoot dynamically optimizes deployment for high-volume workloads—cutting costs and improving latency dramatically.

Author

Pardeep Singh

Date

June 17, 2023

Table of contents

TOC

Share

Get started

Introducing FlashBoot: 1-Second Serverless Cold-Start

Runpod's serverless journey started just a few months ago, yet we've come a long way. In pursuit of reducing costs, striving for efficiency, and performance improvements, we are finally making FlashBoot available for all endpoints at no additional cost! 🎉

What is FlashBoot?

We have been tinkering this past month trying to reduce cold-starts for GPU-intensive tasks like inference. FlashBoot is our optimization layer to manage deployment, tear-down, and scale-up activities in real-time. The more popular an endpoint is, the more likely FlashBoot will help reduce cold-start. We have seen cold-starts as low as 500ms. 😳

How realistic is this??

Let's get dirty with numbers

From the above graph, our lowest cold-start was 563 milliseconds, and max was 42 seconds. Without FlashBoot, we would incur 42 second cold-starts, since we load all Whisper models into GPU VRAM (and this takes a long time).

Whisper 1hr cold-start P99 and more in milliseconds

We get a better picture with P99 and P95 metrics. 95% of our cold-starts are less than 2.3 seconds, and 90% are less than 2s! 😍

FlashBoot has helped reduce our cold-start costs for Whisper endpoint by more than 70% while providing faster response times to our users.

Will FlashBoot work for LLMs?

Yes. Flashboot should work for any type of workload. Results may vary, but as long as you have good volume of requests, FlashBoot works! We will be testing LLM functionality in conjunction with Runpod Serverless + FlashBoot in the coming weeks, so stay tuned!

How can I enable FlashBoot?

When you create or edit your endpoint, you can enable FlashBoot on the right. While testing, make sure to run several requests to see good results.

We have even more features planned for serverless; until then, enjoy FlashBoot! 😁

‍

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

Get started ->

Request a demo

Introducing FlashBoot: 1-Second Serverless Cold-Start

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Introducing FlashBoot: 1-Second Serverless Cold-Start

How to Code Stable Diffusion Directly in Python on RunPod

How to Remix Artwork with ControlNet + Stable Diffusion

Deploying Multimodal Models on RunPod

Build what’s next.

Introducing FlashBoot: 1-Second Serverless Cold-Start

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Introducing FlashBoot: 1-Second Serverless Cold-Start

Related articles.

How to Code Stable Diffusion Directly in Python on RunPod

How to Remix Artwork with ControlNet + Stable Diffusion

Deploying Multimodal Models on RunPod

Build what’s next.

You’ve unlocked areferral bonus!

You’ve unlocked a
referral bonus!