Nematron

Software / App

NVIDIA's dataset that used a prompt-based model to score educational value and incorporated synthetic data, resulting in 6 trillion tokens.

Mentioned in 1 video