evaluating-code-models

Installation

SKILL.md

BigCode Evaluation Harness - Code Model Benchmarking

BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).

Installation:

git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
cd bigcode-evaluation-harness
pip install -e .
accelerate config

Installs

Repository

GitHub Stars

9.4K

First Seen

Jan 21, 2026

Security Audits

evaluating-code-models — zechenzhangagi/ai-research-skills