Google Launches Gemini Spark With 76.2% Terminal-Bench Edge

Google Launches Gemini Spark With 76.2% Terminal-Bench Edge

Google introduced gemini spark with Gemini 3.5 and started the series with 3.5 Flash. The new model is built for agentic workflows, coding, and long-horizon tasks. Google says 3.5 Flash is available today to billions of people globally, which gives developers and enterprise teams immediate access.

Gemini 3.5 Flash and Terminal-Bench 2.1

3.5 Flash outperformed Gemini 3.1 Pro on Terminal-Bench 2.1 with a score of 76.2%. It also led on GDPval-AA, MCP Atlas, and CharXiv Reasoning. Those results matter for teams trying to automate multi-step work, because the model is being pitched not as a chat layer but as a system for agents that have to hold context, code, and tool use together.

Google said 3.5 Flash is four times faster than other frontier models when measured by output tokens per second. It also costs less than half as much as other frontier models in many cases. That combination points to a practical tradeoff that developers know well: higher throughput can make an agent feel usable, while lower cost can decide whether it runs at all.

Antigravity and 3.5 Pro

The updated Antigravity harness lets 3.5 Flash deploy collaborative subagents to tackle problems at scale. Under supervision, it can execute multi-step workflows and coding tasks while sustaining frontier performance. Google also said 3.5 Flash can automatically rename and categorize unstructured assets based on dynamic criteria, and that it used two agents to synthesize the AlphaZero paper and code a fully playable game in six hours.

Google said 3.5 Pro is already being used internally. It expects to roll out 3.5 Pro next month. That leaves the immediate user choice on 3.5 Flash, while the higher-tier model stays behind the curtain for now.

AI Studio and web output

Building on Gemini 3, 3.5 Flash generates richer, more interactive web UIs and graphics. In AI Studio, it creates interactive animations for a research paper, turns a plain text description into interactive hardware, executes multiple concepts in parallel to build a branding concept for a school fundraiser, and generates different UX approaches for a checkout flow in 60 seconds. For teams testing agentic coding and design workflows, the question now is whether those gains hold outside Google’s own demos and into production systems.

Next