.jpeg)
Multi-Instance GPUs on Runpod: Stop Paying for Compute You Don't Need
With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.
Blog
This week’s Runpod RoundUp covers major releases including Llama-2 with 32k context support, SDXL 1.0’s public release, and StabilityAI’s new Stable Beluga LLMs—all now available to run on Runpod.

Welcome to the Runpod Roundup for the week ending July 29, 2023. In this issue, we'll be discussing the newest advancements in AI models over the past week, with a focus on new offerings that you can run in a Runpod instance right this second. In this issue, we'll be looking at the new SDXL release as well as new LLM model advancements.
It's not even been a week since Meta and Microsoft released Llama-2, and the community has been hard at work...
togethercomputer has released their 32k token context version of Llama-2 7b for anyone to download off of Huggingface. Although there have been several versions of closed models (GPT-4, et al) that have had 32k token context, this is I believe the first freely available open-sourced model that has made the jump to 32k. Although many front ends such as Oobabooga do not yet support 32k context windows, that is likely to change as these models become more commonplace.
conceptofmind has also released 16k context versions of Llama 2 (called LLongMA-2) and they have 7b and 13b versions available.
(Could OpenAI also be looking into releasing an open-source LLM of their own in the not-too-distant future? Maybe...)
The long-awaited SDXL is finally out and available for use on Runpod - no research credentials required! Not to belabor the point, but SDXL really is the next generation (pun intended) of AI art, and we've got a helpful blog post that will help you get it up and running in your pod.

It's clearly been a busy week for Stability AI, as they have also released two LLMs, Stable Beluga 1 and 2. The Stable Beluga 1 entry on Huggingface only holds delta weights (likely due to licensing issues with Llama-1) but Stable Beluga 2 is the complete model available to download. The former is a Llama-1 65b model, while the latter is a Llama-2 70b version, both of which have been finetuned on Microsoft Orca style datasets. According to their press release, both models have a focus on intricate reasoning and answering complex questions relating to specialized domains requiring subject matter expertise, such as law and mathematical problem solving.
Feel free to reach out to Runpod directly if you have any questions about these latest developments, and we'll see what we can do for you!
Blog Posts
.jpeg)
With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.

How 1,100 researchers beat OpenAI's own baseline with 16 megabytes and 10 minutes.

Learn how to set up a real-world agentic system with our new Flash framework.