Octopack

Study / Research

A dataset from BigCode that maps GitHub commits to changes, providing high-quality diff data that can be used to train LLMs on code.

Mentioned in 1 video