EvalPlus is a rigorous evaluation framework for LLM4Code, with: Why EvalPlus? Want to know more details? Read our NeurIPS'23 paper as well as our Google Slides!
EvalPlus evaluates AI Coders with rigorous tests. · Notes · More Leaderboards.
EvalPlus team aims to build high-quality benchmarks for evaluating LLMs for code. Below are the benchmarks we have beening building so far: ...
evalplus has 8 repositories available. Follow their code on GitHub.
May 2, 2023 · We propose EvalPlus -- a code synthesis evaluation framework to rigorously benchmark the functional correctness of LLM-synthesized code.
EvalPlus is a rigorous evaluation framework for LLM4Code, with: ✨ HumanEval+: 80x more tests than the original HumanEval! ✨ MBPP+: 35x more tests than the ...
Apr 17, 2024 · EvalPlus: Rigorous Evaluation of LLMs for Code Generation. GitHub Repo: evalplus/evalplus; Leader Board: evalplus.github.io ...
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Approach: We propose EvalPlus – an evaluation framework to reveal the real correctness of. LLM-synthesized code. The test-case generation approach of EvalPlus ...