Google I/O 2026: Gemini 3.5 Flash Debuts, AI Search Era Begins and More

by Tim Sweezy — Wednesday, May 20, 2026, 11:41 AM EDT

At its annual I/O developer conference, one thing Google made clear during the opening keynote is the company is no longer building AI features on top of its existing products, but instead is moving toward the 'agentic AI' era full steam ahead. Michael Dell talked about the agentic AI era during his presentation at Dell Tech World 2026 too, remarking that companies that do not become agentic AI focused will "struggle to survive." It is abundantly clear Google plans on not only surviving the new era, but thriving as well.

While there was no mention of the upcoming Googlebook, it is clear Google wants to make sure everyone, including its competition, is aware of how much it is focused on creating an effortless flow across devices when working with AI.

A $180–190 Billion Bet on Custom Silicon

Before getting to the products, the scale matters. Google has committed to spending between $180 billion and $190 billion in infrastructure investment this year, roughly six times its 2022 capital expenditure figure. A significant portion of that spending flows into Google's custom Tensor Processing Unit program, now entering its eighth generation exactly a decade after the original TPU debuted.

Two Chips, Two Jobs: TPU 8t and TPU 8i

For the first time, Google is splitting its TPU architecture into two purpose-built chips rather than a single unified design.

The TPU 8t is optimized for large-scale pretraining, delivering nearly three times the raw computing power of the previous generation. Google now claims it can distribute training workloads across more than one million TPUs globally, scaling beyond the limits of any single data center. The TPU 8i, meanwhile, is designed purely for inference, the high-volume, latency-sensitive workload that consumer-facing AI products generate continuously. Both chips deliver up to twice the performance-per-watt compared to prior-generation silicon.

Gemini 3.5 Flash: Speed Meets Frontier Intelligence

On the model side, Google's headline announcement is Gemini 3.5 Flash, the company's first model designed to combine frontier-level capability with low-latency execution. According to Google, it outperforms the older Gemini 3.1 Pro across virtually all internal benchmarks, with a particularly significant jump in complex coding tasks. The company says 3.5 Flash delivers four times the output token generation speed of competing frontier models.

Google CEO Sundar Pichai revealed the company is now processing more than 3.2 quadrillion tokens per month, a massive increase from 480 trillion tokens per month at I/O 2025.

Antigravity 2.0: The Developer Control Hub

Google also updated Antigravity, its agentic development platform, to version 2.0. Antigravity features two primary views: an Editor view that functions like a familiar IDE interface with an agent sidebar, and a Manager view that serves as a control center for orchestrating multiple agents working in parallel across workspaces. Within this environment, Google claims a hyper-optimized version of Gemini 3.5 Flash can reach significantly faster generation speeds than standard frontier model deployments.

Search, YouTube, and the Web Get Permanent AI Agents

The compute horsepower is being funneled directly into Google's core properties. Google said it is placing AI agents directly inside the Search box, capable of handling tasks such as completing purchases, checking ticket availability, and managing schedules in real time. Search will also support longer, more natural language queries.

Google is also deploying persistent "information agents" that run continuously on Google Cloud virtual machines, tracking user objectives and building personalized, interactive dashboards, a significant departure from the traditional query-and-result model.

Other notable integrations announced at I/O include:

Ask YouTube: A coming update that allows users to query complex concepts across the video platform, with the ability to automatically navigate timelines and jump to the most contextually relevant moment.
Gemini Omni Flash: A native multimodal architecture that handles mixed media inputs and outputs text, audio, and video simultaneously.
SynthID Expansion: Google's invisible digital watermarking system is expanding its partner ecosystem. OpenAI, Nvidia, ElevenLabs, and Kakao have agreed to integrate SynthID, establishing a cross-industry standard for AI-generated content verification across Chrome and Search.

If Google's keynote speakers wanted viewers to walk away with one thing in mind from I/O 2026, it was perhaps this: AI should not be a destination you visit, it should be the infrastructure you live inside. Between the dual-chip TPU rollout, the Gemini 3.5 Flash speed gains, and agents embedded across Search and YouTube, Google is making a very expensive, very coherent bet that the agentic era isn't coming, it's already here.

Tags: YouTube, Gemini, AI, (nasdaq:goog), google search, agentic ai, google i/o 2026

Tim Sweezy

Tim's first PC was a Tandy TRS-80 and cut his gaming teeth on Pong, Atari, and the local arcade. He now enjoys sharing his passion for tech with his sons and grandsons. Opinions and content posted by HotHardware contributors are their own.