chinese-llm-benchmark: ReLE Benchmark (formerly CLiB) provides a continuously updated evaluation for Chinese AI large language models, covering over 337 commercial and open-source LLMs. It offers multi-dimensional capability assessments across various domains, along with comprehensive rankings and a large defect library for model improvement.; Awesome-LLM-in-Social-Science: This repository curates a comprehensive collection of papers focusing on the intersection of Large Language Models and Social Science. It covers evaluation, alignment, application, surveys, and datasets, with a special emphasis on Psychology and intrinsic values.
Comparing and selecting the best performing LLMs for specific applications.
Evaluating LLM characteristics (e.g., values, personality, morality) from a social science perspective