Gemma 4: Google Goes All-In with Apache 2.0 and On-Device AI

When Google released Gemma 1 in 2024, the community reaction was mixed: the model was capable, but the license had restrictions that limited commercial use. Gemma 2 improved performance while maintaining licensing ambiguity. Gemma 4, released on April 2, 2026, settled the question with a change the company described as "the biggest shift since Gemma 3": Apache 2.0 across the entire family.

Apache 2.0 is the de facto corporate standard for open source. Unrestricted commercial use, free modification and redistribution, explicit patent protection. For a company the size of Google to release frontier-class models under Apache 2.0 is a statement of intent about the ecosystem they want to build.

The Four Gemma 4 Models

Gemma 4 was released in four variants with distinct positioning.

E2B and E4B are the edge models — designed to run on smartphones, Raspberry Pi, and devices like NVIDIA Jetson Orin Nano. The names reflect the effective inference footprint: 2 billion and 4 billion active parameters, respectively. They run fully offline with near-zero latency. They process text, image, video, and audio natively. 128K token context window.

26B-A4B is the mid-tier model — 26 billion total parameters, 4 billion active via MoE. 256K token context. Positioned for smaller on-premises servers, development laptops, and low-cost APIs. It ranked sixth on the Arena AI text leaderboard at launch.

31B is the family flagship — a dense 31-billion-parameter model with a 256K token context. Third place on the Arena AI text leaderboard at launch, behind only models with far more parameters. It is the family's quality benchmark for reasoning and generation tasks.

"Byte for Byte" — What That Claim Actually Means

Google described Gemma 4 as "byte for byte, the most capable open models." It is a precise technical claim: the ratio between model size (bytes of storage) and output quality is the best among available open models.

The 31B ranking third globally in text quality, despite being significantly smaller than the top models on the list, supports that claim. Parameter efficiency — how much performance is extracted from each billion parameters — is where Google focused Gemma 4 development.

The practical implication is immediate: the 31B runs on hardware that would be insufficient for equivalently performing models from other families. Four consumer GPUs are enough, versus eight or more for comparable models from other providers.

Multimodality on Edge Devices

The most notable characteristic of the E2B and E4B models is native video and audio processing on edge devices. Most models that process video require server hardware — multi-GB VRAM GPUs, network connectivity for external APIs, network latency.

The E4B does this on a Raspberry Pi or smartphone, offline. For industrial IoT use cases — security camera analysis, sensor audio processing, computer vision on production lines — this combination of native multimodality with offline deployment removes infrastructure dependencies that were previously unavoidable.

The E2B and E4B process data directly on the device, with no server round-trip. For applications where data privacy is regulated — healthcare, finance, defense — on-device processing eliminates concerns about transmitting sensitive data to external APIs.

Google's Strategic Positioning

Google faces an inherent tension in releasing open models: Gemini 3.1 Pro, its proprietary frontier model, is the premium offering. Gemma 4 is officially the open-source version.

But the 31B's third-place ranking globally makes that distinction less clear. For a growing number of use cases, Gemma 4-31B delivers results comparable to Gemini 3.1 Pro on text tasks — at a fraction of the cost, without API dependency, under a fully open license.

The strategy can be read as ecosystem building: by having the best available open-source model, Google ensures that PyTorch, JAX, TensorFlow, and the ML infrastructure running Gemma also runs on Google Cloud. The open model feeds the closed platform.

Who Gemma 4 Is Relevant For

For teams that need offline or edge deployment, the Gemma 4 E2B/E4B family has no competitive equivalent with native multimodality as of April 2026.

For those who need frontier-quality output on limited hardware, the 31B offers the best parameter-to-performance ratio available under an open license.

For teams operating in Europe or in jurisdictions with restrictions on Meta's Llama 4 license, Gemma 4 with Apache 2.0 is the direct alternative with no legal barriers.

Google entered 2026 with the most open bet it has ever made on the LLM ecosystem. Gemma 4 is not a second-tier model released for public relations purposes — it is competitive where it matters, open where competitors are restrictive, and was designed for the use cases no one else is adequately covering.