AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Compare/
chinese-llm-benchmark vs AgentBench
chinese-llm-benchmark logo
chinese-llm-benchmark
★ 6.1k
vs
AgentBench logo
AgentBench
★ 3.5k

chinese-llm-benchmark vs AgentBench

chinese-llm-benchmark: ReLE Benchmark (formerly CLiB) provides a continuously updated evaluation for Chinese AI large language models, covering over 337 commercial and open-source LLMs. It offers multi-dimensional capability assessments across various domains, along with comprehensive rankings and a large defect library for model improvement.; AgentBench: AgentBench is a comprehensive benchmark for evaluating Large Language Models (LLMs) as agents across diverse environments, now featuring a function-calling version integrated with AgentRL. It provides a containerized setup for various tasks like OS interaction, database operations, and web shopping, enabling robust and reproducible agent evaluation.

01

TL;DR

chinese-llm-benchmark logoChoose chinese-llm-benchmark if…

Comparing and selecting the best performing LLMs for specific applications.

AgentBench logoChoose AgentBench if…

Systematically benchmark the performance of various LLM-based agents.

02

Side-by-Side Comparison

Field
chinese-llm-benchmark logochinese-llm-benchmark
AgentBench logoAgentBench
Category
RAG / Knowledge Base
Observability
Stars
★ 6.1k
★ 3.5k
License
—
Apache-2.0
Updated
1w ago
3mo ago
Open Source
Yes
Yes
Website
↗ Visit
↗ Visit
GitHub
↗ GitHub
↗ GitHub
Tags
LLM Evaluation, Chinese LLMs, AI Benchmark
LLM Evaluation, Agent Benchmarking, Function Calling
03

Features

chinese-llm-benchmark logochinese-llm-benchmark
01Extensive coverage of 337+ commercial and open-source Chinese LLMs.
02Multi-dimensional evaluation across 7 main domains and ~300 sub-dimensions.
03Provides detailed ranking lists for various capabilities and specific domains.
04Offers a large defect library with over 2 million LLM flaws for research and improvement.
05Supports customized model selection and free evaluation services for private models.
AgentBench logoAgentBench
01Comprehensive LLM-as-Agent Evaluation across diverse environments.
02Function Calling integration for advanced agent interaction.
03Fully containerized deployment using Docker Compose for reproducibility.
04Multi-task and multi-turn interaction for realistic agent assessment.
05Extensible framework for adding new evaluation tasks.
04

Use Cases

chinese-llm-benchmark logochinese-llm-benchmark
↳Comparing and selecting the best performing LLMs for specific applications.
↳Identifying weaknesses and improving the capabilities of large language models.
↳Benchmarking private or custom LLMs against public models for performance and cost optimization.
AgentBench logoAgentBench
↳Systematically benchmark the performance of various LLM-based agents.
↳Develop and refine advanced LLM agent architectures and strategies.
↳Conduct academic research on the capabilities and limitations of agentic AI.
05

Best For

chinese-llm-benchmark logochinese-llm-benchmark
TrendingEssential
AgentBench logoAgentBench
TrendingEssential
FAQ

FAQ

What is the difference between chinese-llm-benchmark and AgentBench?
Both chinese-llm-benchmark and AgentBench are in the RAG / Knowledge Base category. chinese-llm-benchmark has 6.1k stars, while AgentBench has 3.5k stars.
Which is better, chinese-llm-benchmark or AgentBench?
The best choice depends on your use case. Choose chinese-llm-benchmark if Comparing and selecting the best performing LLMs for specific applications., and AgentBench if Systematically benchmark the performance of various LLM-based agents..
Is chinese-llm-benchmark free or open source?
Yes, chinese-llm-benchmark is open source on GitHub.
Is AgentBench free or open source?
Yes, AgentBench is open source on GitHub (Apache-2.0).
→

Related

Alternatives to chinese-llm-benchmark →Alternatives to AgentBench →chinese-llm-benchmark details →AgentBench details →
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.