This dataset is a benchmark created by OpenAI to test "code generation" capabilities. It consists of 164 Python programming tasks that include:
Many developers host mirrors of the HumanEval dataset for easy integration into testing pipelines. Technical Structure Download 164K txt
The name and parameters of the code to be written. Docstrings: A text description of what the code should do. This dataset is a benchmark created by OpenAI
Verification scripts to ensure the generated code actually works. Why People Download It Download 164K txt