Key Moments

Building with Instruction-Tuned LLMs: A Step-by-Step Guide

DeepLearning.AIDeepLearning.AI
Entertainment5 min read60 min video
May 31, 2023|57,190 views|1,573|59
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

Instruction-tuned LLMs significantly outperform base models, but fine-tuning for specific tasks can be done cheaply and efficiently using techniques like QLoRA, even on consumer hardware.

Key Insights

1

Instruction tuning improves LLMs' ability to follow human instructions, enhance truthfulness, and reduce toxicity compared to base models, as shown by the "orange green airplane" example.

2

The Dolly 15K dataset contains 15,000 human-generated prompt-response pairs across various instruction categories and can be used for commercial purposes.

3

QLoRA, a refined parameter-efficient fine-tuning technique, enables training of large LLMs like OpenLLaMA 7B on a single A100 GPU, drastically reducing compute and memory requirements through 4-bit quantization.

4

Fine-tuning the input-output schema of an instruction-tuned model allows it to specialize in a single task, with examples showing effective results with as few as 17 data points and 100 training steps.

5

Training large LLMs has become significantly more accessible, with QLoRA enabling fine-tuning of 7B parameter models on Google Colab Pro for under a month's subscription cost, and efforts like LoRA and QLoRA reducing trainable parameters by over 99%.

6

While building complex LLM applications often involves techniques like LangChain and vector databases for data integration, the core of model specialization lies in instruction tuning and fine-tuning the input-output schema.

Instruction tuning vastly improves LLM responses over base models

The workshop begins by demonstrating the power of instruction tuning with a simple "odd one out" task. A base model incorrectly identifies 'orange' as the odd one out in a list including 'green' and 'airplane', providing a nonsensical explanation. In contrast, an instruction-tuned model correctly identifies 'airplane' and offers a coherent rationale, highlighting the substantial improvement in understanding and reasoning. This initial example sets the stage for understanding how instruction tuning aligns LLMs with human expectations, leading to more useful and reliable outputs.

Understanding LLM training: from pre-training to fine-tuning

The evolution of LLMs like OpenAI's GPT series starts with unsupervised pre-training on vast internet data, followed by supervised fine-tuning to improve performance on classic NLP benchmarks. Prompt engineering, including zero-shot and few-shot learning, allows interaction with these general models. However, for specific applications, fine-tuning the input-output schema is crucial, effectively carving out a specialized region within the LLM's latent space for a single, high-powered task. Instruction tuning, a subset of supervised fine-tuning, specifically focuses on aligning models with human instructions, improving truthfulness, reducing toxicity, and enhancing overall usability.

Leveraging open-source tools for efficient instruction tuning

The first demo showcases instruction tuning using OpenLLaMA, a reproduction of Meta's LLaMA, and the Dolly 15K dataset. Dolly 15K comprises 15,000 high-quality, human-generated prompt-response pairs suitable for commercial use. The process involves preparing the data by unifying instruction, context, and response into a single text column formatted for the training library. Crucially, the demo highlights QLoRA, a novel technique that drastically reduces the computational resources needed for fine-tuning. By employing 4-bit quantization (reducing parameter size to 4 bits from 32) and LoRA's low-rank adaptation, which decomposes large weight matrices into smaller ones, the number of trainable parameters is significantly cut. This allows a 7-billion parameter model to be fine-tuned on a single A100 GPU, costing less than a month of Google Colab Pro, demonstrating unprecedented accessibility for training powerful LLMs.

Fine-tuning the input-output schema for task-specific superpowers

The second demo shifts focus to fine-tuning the input-output schema, demonstrating how to take an off-the-shelf instruction-tuned model (like Bloom-Z) and further train it for a very specific task. This is an unsupervised fine-tuning process where the model learns to generate outputs matching a desired format and style. The example uses synthetically generated data for creating marketing email copy. The goal is to teach the model to produce emails in a specific company voice and tone. Even with a tiny dataset of just 17 examples and training for only 100 steps, the fine-tuned model generates significantly better marketing emails compared to the base model, showcasing the effectiveness of data-centric fine-tuning for specialized applications. This process, using techniques like LoRA with 8-bit quantization on a Bloom 3B model, dramatically reduces the model's active parameters, making intensive customization feasible on consumer-grade hardware.

Key takeaways: accessibility and the future of LLM development

The workshop emphasizes that instruction tuning is a subset of fine-tuning focused on human alignment, while input-output schema fine-tuning specializes the model for a single task. The emergence of techniques like LoRA and QLoRA has democratized LLM fine-tuning, making it possible to achieve impressive results with limited compute resources – even on free Google Colab tiers for smaller models or with consumer GPUs for larger ones. The cost for fine-tuning can be as low as pennies. The speakers encourage beginners to start by experimenting with existing APIs like ChatGPT and then gradually move towards fine-tuning, highlighting that the barrier to entry for both inference and training has never been lower. The future points towards increasingly efficient and accessible LLM development, enabling specialized applications that rival larger, more general models in performance for specific tasks.

Addressing common questions: hallucinations, confidential data, and getting started

During the Q&A, key concerns are addressed. Hallucinations and ensuring answers come from specific data can be mitigated by integrating retrieval processes, such as using LangChain to provide source documents alongside LLM responses. For confidential data, sanitization and pre/post-processing steps are recommended, though complete elimination of leakage risk without removing data is challenging. The practicality of building LLMs without massive computational resources is confirmed, thanks to methods like LoRA and QLoRA, which dramatically reduce trainable parameters and computational needs, making them feasible on consumer hardware. Beginners are advised to start with basic prompting on platforms like ChatGPT, then move to API usage, and eventually explore fine-tuning, emphasizing hands-on building and iterative learning.

Building with Instruction-Tuned LLMs: Cheat Sheet

Practical takeaways from this episode

Do This

Prioritize instruction-tuned models for building AI applications.
Start with zero-shot and few-shot prompting before fine-tuning.
Adopt a data-centric approach when curating data for fine-tuning.
Use Q-LoRA for efficient LLM fine-tuning with reduced compute.
Consider 4-bit Q-LoRA for maximum compute reduction, monitoring metrics.
Leverage tools like LangChain for complex LLM applications and data integration.
Sanitize outputs or use pre/post-processing for confidential data.

Avoid This

Do not rely solely on base LLMs without instruction tuning for new applications.
Avoid deep dives into complex topics like vector databases or chaining during initial LLM application building.
Do not expect synthetically generated data to be suitable for commercial use; use proprietary company data instead.
Do not use masked language modeling (MLM) when fine-tuning causal language models.
Do not ignore the importance of verifying model outputs and checking for potential hallucinations.

Common Questions

Instruction tuning is a subset of supervised fine-tuning focused on aligning LLMs with human instructions, improving performance on benchmarks and metrics like truthfulness. Fine-tuning the input/output schema, on the other hand, makes a general model highly specialized for a single task.

Topics

Mentioned in this video

Software & Apps
GPT

The GPT lineage (GPT, GPT-2, GPT-3) is discussed as the foundation of LLMs, built on unsupervised pre-training. GPT-4 mentioned as a tool for synthetic data generation.

Dolly

An open-source LLM released with the Dolly 15K dataset, which can be used for commercial purposes.

OpenLLaMA

A reproduction of Meta's LLaMA by Berkeley's OpenLM Research, used in the first demo for supervised instruction tuning. A 7-billion parameter preview was discussed.

Llama

Large Language Model Meta AI, whose reproduction led to OpenLLaMA. Discussed as a base for research and development in LLMs.

BitsAndBytes

A library used for quantization, enabling efficient LLM fine-tuning, especially in conjunction with Q-LoRA.

Google Colab

A cloud-based notebook environment used for demonstrating LLM fine-tuning, capable of running even large models with Pro subscriptions.

OpenLLaMA 7B

A 7 billion parameter model used in the first demonstration for supervised instruction tuning.

TRL

A library from Hugging Face that includes a supervised fine-tuning (SFT) trainer, used for efficient model training.

Bloom 3B

A 3 billion parameter model discussed in the second demo. LoRA successfully reduces its trainable parameters to less than 1%.

OpenAI GPT-4

Used to synthetically generate data for the AI marketing assistant example in the second demo. Real-world applications should use proprietary company data.

Bloom Z

An instruction-tuned version of the large Bloom model, used in the second demo for unsupervised fine-tuning to create an AI marketing assistant.

LangChain

A framework mentioned as a low-barrier entry method for incorporating custom data into LLM applications and for building complex LLM applications.

Coursera

An online learning platform where DeepLearning.AI offers courses, with promo codes provided to select attendees.

Chatbot

Mentioned as a type of application that can be built using LLMs, with examples provided in the GitHub repo.

ChatGPT

Recommended as a starting point for beginners to understand prompting and zero/few-shot learning.

GPT-3.5 Turbo

Mentioned as an example of how instruction tuning improved upon earlier models like DaVinci.

More from DeepLearningAI

View all 101 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free