AI News May 2026: Qwen 3.7, Gemini 3.5 Flash, DeepSeek V4 Price Cut & Everything That Shipped This Month
AI News May 2026: Qwen 3.7, Gemini 3.5 Flash, DeepSeek V4 Price Cut & Everything That Shipped This Month
TL;DR: May 2026 was quietly one of the biggest months in AI this year. Alibaba previewed Qwen 3.7-Max (the strongest Chinese model on public leaderboards), Google shipped Gemini 3.5 Flash at I/O, DeepSeek made its V4-Pro 75% discount permanent, Cursor released a frontier-grade in-house model (Composer 2.5), and xAI launched Grok Build CLI beta. Here's what actually matters for developers, builders, and power users.
📅 May 2026 Release Timeline at a Glance
| Date | Release | Type |
|---|---|---|
| May 13 | Cursor 3.4 | Tooling |
| May 14 | Ollama 0.24 | Tooling |
| May 14 | Grok Build CLI (beta) | Tooling |
| May 14 | Anthropic billing split announcement | Policy |
| May 18 | Composer 2.5 (Cursor's in-house model) | Model |
| May 19 | Gemini 3.5 Flash | Model |
| May 20 | Qwen 3.7-Max-Preview | Model |
| May 22 | DeepSeek V4-Pro permanent price cut | Pricing |
🏆 Frontier Model Releases
Qwen 3.7-Max-Preview (Alibaba, May 20)
Alibaba announced Qwen 3.7-Max at the Apsara Cloud Summit in Hangzhou on May 20. The preview variant had already been live on LMSys Chatbot Arena for a week as qwen3.7-max-preview.
Key specs:
- Context window: 1 million tokens with extended-thinking mode
- Benchmarks: Elo 1,475 on LM Arena (#13 overall), #7 on Math, #10 on Coding
- Agentic capability: Alibaba demoed a 35-hour autonomous run chaining over 1,000 tool calls without measurable degradation
- Pricing: $2.50 / $7.50 per 1M tokens (input / output) on OpenRouter
- Status: Preview only; the flagship Max variant is closed-weight
Why it matters: Qwen 3.7-Max is now the strongest Chinese closed-weight model on public leaderboards, but the open-weight release everyone wants is still pending.
Gemini 3.5 Flash (Google, May 19)
Google announced Gemini 3.5 at I/O 2026, with the Flash variant shipping immediately. This is Google's first model explicitly positioned as agent-first rather than chatbot-first.
Key specs:
- Performance: Google claims 3.5 Flash beats Gemini 3.1 Pro on "nearly all" benchmarks, including coding, agentic tasks, and multimodal reasoning
- Availability: Gemini API in AI Studio, Android Studio, Google Antigravity, Gemini Enterprise, and AI Mode in Search
- What's next: Gemini 3.5 Pro confirmed in internal use, expected June 2026
Why it matters: The agent-first positioning is a strategic shift. Google is betting that the next wave of AI usage is tool-use and automation, not conversation. If you're building agentic workflows, this is worth benchmarking.
Composer 2.5 (Cursor, May 18)
Cursor shipped Composer 2.5 as a frontier-grade in-house coding model — the first time Cursor has positioned its own model as competing head-to-head with Claude Opus 4.7 and GPT-5.5.
Key specs:
- Benchmarks: 79.8% SWE-Bench Multilingual, 63.2% CursorBench v3.1
- Pricing: Standard tier $0.50 / $2.50 per 1M tokens; fast variant $3.00 / $15.00
- Status: Rolled out to all Cursor users
Why it matters: Cursor is no longer just a UI on top of third-party models — it now has its own competitive foundation model. This changes the economics of AI-assisted coding significantly.
💰 Pricing Changes That Actually Matter
DeepSeek V4-Pro Permanent Price Cut (May 22)
This is arguably the biggest news of the month for cost-conscious teams. The 75% promotional discount on V4-Pro — running since April — is now permanent.
New permanent rates:
- Uncached input: $0.435 per 1M tokens
- Output: $0.87 per 1M tokens
- Cached input: $0.003625 per 1M tokens
For context: At these prices, DeepSeek V4-Pro is roughly 8x cheaper than Claude Opus 4.7 on input and 10x cheaper on output, while remaining one of the strongest open-weights coding models.
Anthropic Billing Split (Announced May 14, Effective June 15)
Anthropic announced a major restructuring of Claude subscriptions. Starting June 15, billing separates into two pools:
- Chat subscriptions (individual usage)
- Agent SDK credits (API-based agent usage)
This is a signal that Anthropic is preparing for a future where agentic workloads dominate over chat usage — and adjusting pricing accordingly.
🛠 Tooling & Platform Updates
Grok Build CLI (xAI, May 14)
xAI released Grok Build in early beta — a coding agent CLI powered by Grok 4.3 with 2M token context and support for up to 8 parallel subagents. SuperGrok subscription ($30/mo) required.
Why it matters: xAI is entering the AI coding assistant space directly. The 2M token context window is the largest of any coding agent CLI currently available.
Ollama 0.24 (May 14)
The local LLM runner added Codex App integration, reworked MLX sampler, and Claude Desktop support. A quality-of-life release for anyone running models locally.
Cursor 3.4 (May 13)
Added team-configurable agent environments and in-app PR review. Enterprise teams got the most value from this release.
Microsoft Agent 365 (May 1)
Microsoft launched Agent 365 at $15/user/month — an enterprise governance layer for all AI agents across Copilot Studio, Foundry, and third-party systems. This is Microsoft's play for enterprise agent management.
🇨🇳 Chinese AI Ecosystem Update
Kimi K2.6 (Moonshot, April 20 — still shaping May conversation):
- 1T-parameter MoE, 32B active per request
- 262K context, $0.60/$2.50 per 1M tokens
- Apache 2.0 CLI + modified MIT weights
- 300-agent swarm mode now GA on paid plans
- Scored 87/100 Tier A on coding benchmarks — the only Chinese model to reach that tier
DeepSeek V4-Flash:
- At $0.14/$0.28 per 1M tokens, it's the cheapest frontier-adjacent model in history
- 284B total params, 13B active, 1M context
- MIT licensed
🔮 What to Watch in June
- Gemini 3.5 Pro — confirmed in internal use, expected roll out in June
- Claude Sonnet 4.8 — leaked in source code, rumored for late May/early June
- GPT-5.6 — no official announcement but expected in the coming weeks
- Llama 5 — Meta's next frontier model, still unconfirmed
- DeepSeek V4 open-weight release — the model weights everyone wants are still pending
💡 How to Use This
- If you're cost-sensitive: Route high-volume API tasks to DeepSeek V4-Flash ($0.14/M tokens) or V4-Pro ($0.435/M)
- If you're building agents: Benchmark Gemini 3.5 Flash — its agent-first architecture is a genuine differentiator
- If you use Cursor: Try Composer 2.5 as your primary model — it matches Opus 4.7 and GPT-5.5 on coding
- If you need self-hosting: Kimi K2.6 (MIT weights) or DeepSeek V4 (MIT) are your best options
- If you're watching the cloud war: GPT-5.5 on AWS Bedrock means multi-cloud AI is now the default
This roundup will be updated monthly. Bookmark it as your single source of truth for what actually shipped in AI.
All data verified against official vendor announcements, pricing pages, and OpenRouter as of May 28, 2026.
Found this helpful? Share it with your team.
Read more articles →