AI News May 2026: Qwen 3.7, Gemini 3.5 Flash, DeepSeek V4 Price Cut & Everything That Shipped This Month

Q: 📅 May 2026 Release Timeline at a Glance

| Date | Release | Type | |------|---------|------| | May 13 | Cursor 3.4 | Tooling | | May 14 | Ollama 0.24 | Tooling | | May 14 | Grok Build CLI (beta) | Tooling | | May 14 | Anthropic billing split announcement | Policy | | May 18 | Composer 2.5 (Cursor's in-house model) | Model | | May 19 | Gemini 3.5 Flash | Model | | May 20 | Qwen 3.7-Max-Preview | Model | | May 22 | DeepSeek V4-Pro permanent price cut | Pricing | ---

Q: 🇨🇳 Chinese AI Ecosystem Update

Kimi K2.6 (Moonshot, April 20 — still shaping May conversation): - 1T-parameter MoE, 32B active per request - 262K context, $0.60/$2.50 per 1M tokens - Apache 2.0 CLI + modified MIT weights - 300-agent swarm mode now GA on paid plans - Scored 87/100 Tier A on coding benchmarks — the only Chinese model to reach that tier DeepSeek V4-Flash: - At $0.14/$0.28 per 1M tokens, it's the cheapest frontier-adjacent model in history - 284B total params, 13B active, 1M context - MIT licensed ---

TL;DR: May 2026 was quietly one of the biggest months in AI this year. Alibaba previewed Qwen 3.7-Max (the strongest Chinese model on public leaderboards), Google shipped Gemini 3.5 Flash at I/O, DeepSeek made its V4-Pro 75% discount permanent, Cursor released a frontier-grade in-house model (Composer 2.5), and xAI launched Grok Build CLI beta. Here's what actually matters for developers, builders, and power users.

📅 May 2026 Release Timeline at a Glance

Date	Release	Type
May 13	Cursor 3.4	Tooling
May 14	Ollama 0.24	Tooling
May 14	Grok Build CLI (beta)	Tooling
May 14	Anthropic billing split announcement	Policy
May 18	Composer 2.5 (Cursor's in-house model)	Model
May 19	Gemini 3.5 Flash	Model
May 20	Qwen 3.7-Max-Preview	Model
May 22	DeepSeek V4-Pro permanent price cut	Pricing

🏆 Frontier Model Releases

Qwen 3.7-Max-Preview (Alibaba, May 20)

Alibaba announced Qwen 3.7-Max at the Apsara Cloud Summit in Hangzhou on May 20. The preview variant had already been live on LMSys Chatbot Arena for a week as qwen3.7-max-preview.

Key specs:

Context window: 1 million tokens with extended-thinking mode
Benchmarks: Elo 1,475 on LM Arena (#13 overall), #7 on Math, #10 on Coding
Agentic capability: Alibaba demoed a 35-hour autonomous run chaining over 1,000 tool calls without measurable degradation
Pricing: $2.50 / $7.50 per 1M tokens (input / output) on OpenRouter
Status: Preview only; the flagship Max variant is closed-weight

Why it matters: Qwen 3.7-Max is now the strongest Chinese closed-weight model on public leaderboards, but the open-weight release everyone wants is still pending.

Gemini 3.5 Flash (Google, May 19)

Google announced Gemini 3.5 at I/O 2026, with the Flash variant shipping immediately. This is Google's first model explicitly positioned as agent-first rather than chatbot-first.

Key specs:

Performance: Google claims 3.5 Flash beats Gemini 3.1 Pro on "nearly all" benchmarks, including coding, agentic tasks, and multimodal reasoning
Availability: Gemini API in AI Studio, Android Studio, Google Antigravity, Gemini Enterprise, and AI Mode in Search
What's next: Gemini 3.5 Pro confirmed in internal use, expected June 2026

Why it matters: The agent-first positioning is a strategic shift. Google is betting that the next wave of AI usage is tool-use and automation, not conversation. If you're building agentic workflows, this is worth benchmarking.

Composer 2.5 (Cursor, May 18)

Cursor shipped Composer 2.5 as a frontier-grade in-house coding model — the first time Cursor has positioned its own model as competing head-to-head with Claude Opus 4.7 and GPT-5.5.

Key specs:

Benchmarks: 79.8% SWE-Bench Multilingual, 63.2% CursorBench v3.1
Pricing: Standard tier $0.50 / $2.50 per 1M tokens; fast variant $3.00 / $15.00
Status: Rolled out to all Cursor users

Why it matters: Cursor is no longer just a UI on top of third-party models — it now has its own competitive foundation model. This changes the economics of AI-assisted coding significantly.

💰 Pricing Changes That Actually Matter

DeepSeek V4-Pro Permanent Price Cut (May 22)

This is arguably the biggest news of the month for cost-conscious teams. The 75% promotional discount on V4-Pro — running since April — is now permanent.

New permanent rates:

Uncached input: $0.435 per 1M tokens
Output: $0.87 per 1M tokens
Cached input: $0.003625 per 1M tokens

For context: At these prices, DeepSeek V4-Pro is roughly 8x cheaper than Claude Opus 4.7 on input and 10x cheaper on output, while remaining one of the strongest open-weights coding models.

Anthropic Billing Split (Announced May 14, Effective June 15)

Anthropic announced a major restructuring of Claude subscriptions. Starting June 15, billing separates into two pools:

Chat subscriptions (individual usage)
Agent SDK credits (API-based agent usage)

This is a signal that Anthropic is preparing for a future where agentic workloads dominate over chat usage — and adjusting pricing accordingly.

🛠 Tooling & Platform Updates

Grok Build CLI (xAI, May 14)

xAI released Grok Build in early beta — a coding agent CLI powered by Grok 4.3 with 2M token context and support for up to 8 parallel subagents. SuperGrok subscription ($30/mo) required.

Why it matters: xAI is entering the AI coding assistant space directly. The 2M token context window is the largest of any coding agent CLI currently available.

Ollama 0.24 (May 14)

The local LLM runner added Codex App integration, reworked MLX sampler, and Claude Desktop support. A quality-of-life release for anyone running models locally.

Cursor 3.4 (May 13)

Added team-configurable agent environments and in-app PR review. Enterprise teams got the most value from this release.

Microsoft Agent 365 (May 1)

Microsoft launched Agent 365 at $15/user/month — an enterprise governance layer for all AI agents across Copilot Studio, Foundry, and third-party systems. This is Microsoft's play for enterprise agent management.

🇨🇳 Chinese AI Ecosystem Update

Kimi K2.6 (Moonshot, April 20 — still shaping May conversation):

1T-parameter MoE, 32B active per request
262K context, $0.60/$2.50 per 1M tokens
Apache 2.0 CLI + modified MIT weights
300-agent swarm mode now GA on paid plans
Scored 87/100 Tier A on coding benchmarks — the only Chinese model to reach that tier

DeepSeek V4-Flash:

At $0.14/$0.28 per 1M tokens, it's the cheapest frontier-adjacent model in history
284B total params, 13B active, 1M context
MIT licensed

🔮 What to Watch in June

Gemini 3.5 Pro — confirmed in internal use, expected roll out in June
Claude Sonnet 4.8 — leaked in source code, rumored for late May/early June
GPT-5.6 — no official announcement but expected in the coming weeks
Llama 5 — Meta's next frontier model, still unconfirmed
DeepSeek V4 open-weight release — the model weights everyone wants are still pending

💡 How to Use This

If you're cost-sensitive: Route high-volume API tasks to DeepSeek V4-Flash ($0.14/M tokens) or V4-Pro ($0.435/M)
If you're building agents: Benchmark Gemini 3.5 Flash — its agent-first architecture is a genuine differentiator
If you use Cursor: Try Composer 2.5 as your primary model — it matches Opus 4.7 and GPT-5.5 on coding
If you need self-hosting: Kimi K2.6 (MIT weights) or DeepSeek V4 (MIT) are your best options
If you're watching the cloud war: GPT-5.5 on AWS Bedrock means multi-cloud AI is now the default

This roundup will be updated monthly. Bookmark it as your single source of truth for what actually shipped in AI.

All data verified against official vendor announcements, pricing pages, and OpenRouter as of May 28, 2026.