Google Gemma 4 and AI Edge Gallery Bring Free Offline AI to Phones as Cloud Platforms Join In

Mountain View, California, April 14, 2026, 14:03 PDT.

Gemma 4 has landed on Google’s AI Edge Gallery for both Android and iPhone, joining a lineup that’s already topped 500,000 downloads, according to the listing.
Together AI, along with Microsoft Foundry, just rolled out hosted Gemma 4 offerings, pushing access past Google’s own apps and cloud.

Google is moving Gemma 4, its latest open-weight AI model family, beyond its initial developer crowd and onto more phones, plus non-cloud setups. The company now lists AI Edge Gallery on major mobile app stores. Other providers, meanwhile, are rolling out bigger hosted versions.

This isn’t just Google rolling out another chatbot. The company positions Gemma as open-weight AI—the kind where developers get direct access to the model parameters for local use. That means prompts and files can stay on the device, API costs drop, and everything keeps running even without an internet connection, based on Google’s own product notes and developer comments.

Google DeepMind rolled out Gemma 4 on April 2, releasing it under an Apache 2.0 license and offering four different sizes. The E2B and E4B variants target phones, while the larger 26B A4B and 31B versions suit heftier hardware. Clement Farabet, vice president at Google DeepMind, and group product manager Olivier Lacombe described the launch as “breakthrough capabilities made widely accessible” in a company blog post. Blog

Google describes the family as aimed at coding, reasoning and what it calls “agentic workflows”—systems that can not only use tools but also execute multiple steps without halting at a single answer. Depending on the version, the models handle as many as 128,000 or 256,000 tokens of context. That’s the industry’s metric for the chunk of text, code or other data a model can take in at once. Google AI for Developers

For mobile users, AI Edge Gallery serves as the company’s app hub. According to Google’s developer blog, the app runs on both Android and iOS. Android Authority noted that users are now able to summarize PDFs, generate code, process images, and transcribe audio right on their devices—no need to send data back to Google’s servers.

Karl Weinmeister, a developer relations specialist at Google Cloud, highlighted last week that the revamped app “shows what’s now possible” right on the device — it can both generate structured code and tweak device settings using natural language, even when offline. Google isn’t stopping there. According to product manager David Chou and developer relations engineer Caren Chang, the company is weaving the same model family into Android: Gemma 4 forms the “foundation for the next generation of Gemini Nano,” and devices running Gemini Nano 4 are set to arrive later this year. Medium

Google rolled out Gemma 4 on Google Cloud as of April 3, touting deployment options for enterprises that want to keep their data in-house. Following that, Together AI added Gemma 4 31B to its own serverless lineup, enabling image input and function calling, with a context window stretching to 256,000 tokens. Pricing stands at $0.20 for every million input tokens, $0.50 for output.

The cloud got a little bigger Tuesday after Microsoft announced Gemma 4 is now live in Microsoft Foundry, opening the door for Azure customers to test and run the models right in their own systems. Weinmeister, in a separate note, pointed out that prompts and skills made locally in Edge Gallery will transfer seamlessly to the beefier Gemma 4 models in the cloud—a clear indication Google is aiming for a single model family that covers everything from mobile to backend.

The race is getting crowded. On Tuesday, Artificial Analysis named Alibaba’s Qwen3.5 27B and Google’s Gemma 4 31B as the top open-weight families under 32 billion parameters. Qwen led on the overall benchmark, but Gemma managed better token efficiency, according to the firm’s .

Trade-offs remain. Google maintains that AI Edge Gallery is a work in progress, with speed hinging on the device’s CPU and GPU. The Android preview? You’ll need AICore-ready hardware to see how fast it really runs. According to Artificial Analysis, models of this size still lag behind the leading proprietary tools when it comes to handling factual accuracy and limiting hallucinations.