Model Architecture & Efficiency
117 B
gpt-oss-120b Total Parameters
Overall model size.
5.1 B
gpt-oss-120b Active Parameters
Parameters engaged during inference.
280‑fold
Cost Reduction
Year‑over‑year inference cost drop.
1.7% vs 8%
Performance Gap Reduction
Open‑weight vs closed‑source model gap in benchmarks.
Parameter Utilization in gpt-oss Models
The chart highlights the massive total capacity of each model versus the fraction used during inference thanks to the Mixture-of-Experts (MoE) architecture. gpt-oss-120b boasts 117B total parameters but activates only 5.1B, while gpt-oss-20b achieves 21B total with 3.6B active.
Inference Cost Reduction
Between November 2022 and October 2024, the cost of running GPT‑3.5‑level systems dropped over 280‑fold, underscoring the economic advantage of deploying efficient models like gpt‑oss.
Key Insights
MoE combined with 4‑bit quantization delivers high efficiency, enabling single‑GPU deployment while maintaining competitive performance, and a dramatic cost advantage that fuels open‑weight adoption.
Strategic Significance of the Open-Weight Release
OpenAI’s decision to release gpt‑oss‑120b and gpt‑oss‑20b under an Apache 2.0 license marks a calculated pivot. By giving developers on‑premise flexibility, OpenAI aims to lead the democratization wave, build an ecosystem that offsets API revenue, and position itself as a thought leader in safety and interpretability through raw Chain‑of‑Thought access. This move also sends a strong signal to competitors, forcing a re‑evaluation of how much capability is safe or desirable to open‑source.
Key open‑source models advancing the field:
Competitive Landscape - Open Source Contenders
Llama 3.3
Mixtral 8x7B
DeepSeek‑R1 & V2.1
Falcon Series
Gemma