chinese-llm-benchmark
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.5、文心ERNIE-X1.1、ERNIE-5.0-Thinking、qwen3-max、百川、讯飞星火、商汤senseChat等商用模型, 以及kimi-k2、ernie4.5、minimax-M2、deepseek-v3.2、qwen3-2507、llama4、智谱GLM-4.6、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。
ReLE Benchmark (formerly CLiB) provides a continuously updated evaluation for Chinese AI large language models, covering over 337 commercial and open-source LLMs. It offers multi-dimensional capability assessments across various domains, along with comprehensive rankings and a large defect library for model improvement.
Features
Compatibility
Use cases
Alternatives
Related searches
Comments
- RRebel BrownMay 22, 2026
The reliable agent design scales well from prototype to production — 5、minimax-m2、deepseek-v3. Good documentation, reduces onboarding time.
- TTaylor ZhangMay 3, 2026
The clean approach to agent memory is more reliable than alternatives — rele评测:中文ai大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5. Would recommend for clean use cases.
- RRobin BrownMar 29, 2026
The robust agent design scales well from prototype to production. Runs fine on Python 3.11.
- SSam JacksonMar 14, 2026
The solid approach to agent memory is more reliable than alternatives. The maintainers are responsive to issues.