AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
RAG / Knowledge Base/
chinese-llm-benchmark
chinese-llm-benchmark logo

chinese-llm-benchmark

Active·★ 6.1k·Updated 2026-05-23
★ Trending★ Essential

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.5、文心ERNIE-X1.1、ERNIE-5.0-Thinking、qwen3-max、百川、讯飞星火、商汤senseChat等商用模型, 以及kimi-k2、ernie4.5、minimax-M2、deepseek-v3.2、qwen3-2507、llama4、智谱GLM-4.6、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。

ReLE Benchmark (formerly CLiB) provides a continuously updated evaluation for Chinese AI large language models, covering over 337 commercial and open-source LLMs. It offers multi-dimensional capability assessments across various domains, along with comprehensive rankings and a large defect library for model improvement.

#LLM Evaluation#Chinese LLMs#AI Benchmark#Model Ranking#Defect Analysis#Data Analysis#Communication
↗ Visit site★ GitHub
01

Features

01Extensive coverage of 337+ commercial and open-source Chinese LLMs.
02Multi-dimensional evaluation across 7 main domains and ~300 sub-dimensions.
03Provides detailed ranking lists for various capabilities and specific domains.
04Offers a large defect library with over 2 million LLM flaws for research and improvement.
05Supports customized model selection and free evaluation services for private models.
02

Compatibility

OpenAI (GPT series)
Supported
Verified via docs
Google (Gemini series)
Supported
Verified via docs
Anthropic (Claude series)
Supported
Verified via docs
Baidu (ERNIE series)
Supported
Verified via docs
Alibaba (Qwen series)
Supported
Verified via docs
DeepSeek
Supported
Verified via docs
03

Use cases

↳Comparing and selecting the best performing LLMs for specific applications.
↳Identifying weaknesses and improving the capabilities of large language models.
↳Benchmarking private or custom LLMs against public models for performance and cost optimization.
04

Alternatives

mindsdb logo
mindsdb★ 39.2k
Federated Query Engine for AI - The only MCP Server you'll ever need
vs →
Brave Search MCP logo
Brave Search MCP★ 86.5k
Allow your AI Agent to search the real-time internet using Brave Search API. Essential for getting up-to-date information.
vs →
Claude Flow logo
Claude Flow★ 56.4k
The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms.
vs →
CopilotKit logo
CopilotKit★ 31.8k
React UI + elegant infrastructure for AI Copilots, AI chatbots, and in-app AI agents. The Agentic Frontend.
vs →
awesome-n8n-templates logo
awesome-n8n-templates★ 22.6k
Supercharge your workflow automation with this curated collection of n8n templates! Instantly connect your favorite apps-like Gmail, Telegram, Google Drive, Slack, and more-with ready-to-use, AI-powered automations. Save time, boost productivity, and unlock the true potential of n8n in just a few clicks.
vs →
dagster logo
dagster★ 15.6k
An orchestration platform for the development, production, and observation of data assets.
vs →
genai-toolbox logo
genai-toolbox★ 15.4k
MCP Toolbox for Databases is an open source MCP server for databases.
vs →
mcp-chrome logo
mcp-chrome★ 11.8k
Chrome MCP Server is a Chrome extension-based Model Context Protocol (MCP) server that exposes your Chrome browser functionality to AI assistants like Claude, enabling complex browser automation, content analysis, and semantic search.
vs →
See all alternatives →

Related searches

chinese-llm-benchmark AlternativesBest RAG / Knowledge Base Tools 2026Open Source RAG / Knowledge Basechinese-llm-benchmark Tutorialchinese-llm-benchmark Vs CompetitorsLLM EvaluationChinese LLMsAI Benchmark

Comments

Log in to leave a comment
  • R
    Rebel BrownMay 22, 2026

    The reliable agent design scales well from prototype to production — 5、minimax-m2、deepseek-v3. Good documentation, reduces onboarding time.

  • T
    Taylor ZhangMay 3, 2026

    The clean approach to agent memory is more reliable than alternatives — rele评测:中文ai大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5. Would recommend for clean use cases.

  • R
    Robin BrownMar 29, 2026

    The robust agent design scales well from prototype to production. Runs fine on Python 3.11.

  • S
    Sam JacksonMar 14, 2026

    The solid approach to agent memory is more reliable than alternatives. The maintainers are responsive to issues.

On this page
01Features02Compatibility03Use cases04Alternatives
Stats
GitHub Stars★ 6.1k
Last commit1w ago
StatusActive
License—
CategoryRAG / Knowledge Base
Trend (30d)
+0.2k↑ 4.6%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.