Back to Blog
MeituanLongCatDigital HumanOpen Source AIReasoning Benchmark

Meituan LongCat Double Drop: Open-Source Commercial Avatar Video + General 365 Reasoning Benchmark

2026-06-078 min read未然

Meituan LongCat Double Drop: Open-Source Commercial Avatar Video + General 365 Reasoning Benchmark

On June 7, 2026, Meituan's LongCat team dropped two significant releases in one day — one that creators can use immediately, and another that exposes just how far AI reasoning still has to go.

Part 1: LongCat-Video-Avatar 1.5 — From "Demo-Ready" to "Production-Ready"

Digital human video generation exploded in 2025-2026, but most open-source solutions had a tell: impressive demos that fell apart in real-world scenarios.

LongCat-Video-Avatar 1.5 is designed for commercial deployment from day one.

Key Upgrades

AreaImprovementReal Impact
Lip SyncWav2Vec2 → Whisper-LargeAccurate Chinese lip matching
Physical PlausibilityEnhanced body pose & gesturesNo more "floating heads"
Long Video StabilityTemporal consistency optimizationStable minute-long clips
Multi-PersonMulti-character interactionInterview & dialogue ready
Inference SpeedModel optimizationRuns on single GPU

Who Should Care

  • Short-video creators: AI avatar replaces on-camera talent
  • Livestream merchants: 24/7 automated digital hosts
  • Online education: Auto-generated virtual instructors
  • Cross-border e-commerce: Multilingual digital human localization

The v1.5 upgrade is notable because it tackles the core problem no open-source avatar model had solved before: Chinese language lip sync at practical accuracy. By replacing Wav2Vec2 with Whisper-Large, the model achieves usable lip matching for Mandarin — a first for open-source digital human models.

Part 2: General 365 — A Reality Check for the Whole Industry

If LongCat-Video-Avatar is a gift for creators, General 365 is a warning shot for the industry.

The Numbers

The Meituan LongCat team evaluated 26 mainstream LLMs:

  • Best score: Gemini 3 Pro — 62.8%
  • Traditional passing grade: 60%
  • Models that failed: More than half

The majority of today's most advanced AI models can't even "pass" a dedicated reasoning test.

What Makes General 365 Different?

Unlike benchmarks that test knowledge recall or language fluency, General 365 tests pure reasoning. You can't game it by memorizing training data patterns.

This reveals an uncomfortable truth: most AI progress over the past two years has been in knowledge coverage and language fluency, not genuine logical reasoning.

What This Means for Users

If you use AI for serious decision-making (data analysis, strategy, code review), don't trust model outputs by default. Even the best model is wrong more than a third of the time on reasoning tasks.

Part 3: The Bigger Picture

These two releases tell the same story:

AI is moving from the "demo economy" to the "real economy."

One half of the story is tools that actually work in production — digital humans in livestreams and classrooms. The other half is a benchmark that pops the "looks smart" bubble — revealing that real reasoning is still far off.

For everyday users, the takeaway is:

  1. More tools are genuinely useful now (avatar video generation is real)
  2. But don't be fooled by impressive demos
  3. Test everything yourself — always

Resources


This article is based on publicly available information released by Meituan's LongCat team on June 7, 2026.

Found this helpful? Share it with your team.

Read more articles
Share: