Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

The constraints were exact: fit your entire language model — weights, training code, everything — inside 16 megabytes, train it in under 10 minutes on 8×H100 GPUs, and beat OpenAI's own baseline score. No institutional affiliation required. Just the model. Runpod was OpenAI's compute partner for the challenge, distributing credits and running the infrastructure that made participation possible at scale.

OpenAI designed Parameter Golf as the first in its Model Craft Challenge series, structured in the tradition of competitive mathematics and programming Olympiads, with rigor and creativity valued over credentials. The baseline was a nine-layer transformer scoring 1.2244 bits per byte on the FineWeb validation set, a competent starting point set deliberately to be beaten. Standout participants were eligible for OpenAI interview invitations, with a June hiring cohort planned targeting undergrads, recent graduates, and Olympiad competitors.

Within five days of launch, the community had already pushed the score to 1.1228 BPB. Six weeks later, the winner codemath3000 came in at 1.0565. That's a 14% improvement over OpenAI's baseline, driven by 1,100+ researchers iterating on quantization-aware training, cross-sequence attention, and techniques the field hadn't applied to this kind of constraint problem before.

For a technical breakdown of how the scoring worked and what drove early leaderboard movement, Brendan's setup guide covers the architecture decisions and how to get started on Runpod.

How the Parameter Golf challenge ran

OpenAI's CRO Mark Chen described Parameter Golf as designed to test whether candidates could "come up with creative ideas in a sandbox setting.” It was built to surface researchers who wouldn't ordinarily come through a standard recruiting pipeline. That kind of challenge only works if the compute is genuinely accessible. Running real experiments on H100s costs real money, and the credit barrier is usually what prevents independent researchers from participating.

Runpod was OpenAI's compute partner for the full six weeks. Participants used the platform across the training workflow: spinning up pods in 31 global data center regions, iterating on an official template built by Runpod's developer relations team and named by OpenAI as the recommended starting point in the challenge's GitHub README.

Credit distribution ran on automation: OpenAI's signup form triggered the Runpod API to deliver a credit code by email across three tiers: Quick Start ($25), Development ($500), Advanced ($1,000). The SLA was 48 hours. Actual delivery held at 2 to 3 minutes throughout the competition with minimal manual intervention.

Runpod brought around 500 GPUs online at launch, with 1,000+ more ready to handle demand spikes as the challenge spread through press and social.

Mid-competition, a participant named Nathan Maine built a 30-second GPU benchmark script for Runpod pods and shared it with other competitors. Nobody asked him to. That kind of tooling gets built when the infrastructure is part of your workflow, not just a resource you're renting.

By the numbers

Compute utilization:

5,100 credit codes created
2,700 redeemed (~53% redemption rate)
$249,550 in compute credits burned across the challenge

GitHub traction (by close of competition):

4,800+ stars
3,200+ forks
1,100+ pull requests submitted
Top 0.1% of globally trending repositories within 48 hours of launch

The Model Craft series

Six weeks of Parameter Golf demonstrated that the efficiency frontier in AI is still wide open. A community of independent researchers, given real compute and a tight constraint, moved it faster than most people expected. That happened in the open, with every technique publicly visible in the GitHub repo, building on itself in real time.

‍