evaluating-code-models

Installation
SKILL.md

BigCode Evaluation Harness - Code Model Benchmarking

Quick Start

BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).

Installation:

git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
cd bigcode-evaluation-harness
pip install -e .
accelerate config
Installs
87
GitHub Stars
9.4K
First Seen
Jan 21, 2026
evaluating-code-models — zechenzhangagi/ai-research-skills