HomeBlogInsights
How to Fine-Tune GPT-3.5
Insights

OpenAI recently announced the release of their GPT-3.5 Turbo fine-tuning APIs. At Scale we have always believed that building custom LLMs through fine-tuning is the key to unlocking greater performance for any given organization’s specific use case. OpenAI also named Scale as their preferred enterprise fine-tuning partner for GPT-3.5, and we have already shown early progress for the power of fine-tuning GPT-3.5 for customers like Brex. While the pre-trained GPT-3.5 model can often solve tasks with prompt engineering, the model becomes even more powerful when fine-tuned, in some cases surpassing GPT-4 in performance. Let’s walk you through how to fine-tune GPT-3.5 in this blog post.

The first and one of the most critical steps in fine-tuning is creating the right training dataset. OpenAI recommends creating a diverse training set of conversations that are similar to the conversations that the model will see in production. These datasets can be created with Scale’s Data Engine, which is trusted by leading ML teams and enterprises to provide large volumes of high-quality data. Scale has worked with OpenAI since 2019 on powering LLMs with better data. Scale's Data Engine has powered most of the leading LLMs, and is now also powering custom LLMs for leading companies like Brex, Chegg, and Accenture. Our cost-effective operations can give you expertly labeled and diverse data at any scale, making it an indispensable asset for fine-tuning.

Fine-tuning also includes reserving the right compute resources and properly implementing the training and inference code. OpenAI’s new fine-tuning APIs makes this process easy and accessible. Let's dive into how you can use this API and explore some examples.

Dataset PreparationFor this example, we will use ScienceQA, a popular dataset consisting of a diverse set of multiple-choice science questions. We used LLM Engine to fine-tune Llama 2 on this dataset in our previous blogpost, and you can follow the data preparation steps in that post.

Now, let's convert this dataset into OpenAI’s supported format, a JSONL file with lists of conversations. Each example in the dataset should be a conversation in the same format as the chat completions endpoint, specifically a list of messages where each message has a role and content (and optionally a name). The provided assistant messages in the data should be the ideal responses you want the model to provide.

def format_chat(row):     return json.dumps(        {"messages": [            {"role": "user", "content": row["prompt"]},            {"role": "assistant", "content": str(row["response"])},         ]}    )def convert_dataset(df, file_name):    df["conversation"] = df.apply(format_chat, axis=1),    with open(file_name, 'w') as jsonl_file:        for example in df["conversation"]:            jsonl_file.write(example + '\n')This results in our dataset looking like this:

{"messages:" [{"role": "user", "content": "Context: In a group of cows, ..."},  {"role": "assistant", "content": "B"}]}["messages": {"role": "user", "content": "Context: In a group of guppies..."},
{"role": "assistant", "content": "A"}]}The OpenAI API supports training with a training and validation dataset, and provides loss numbers on both during the course of training. These samples will be your initial signal of how much the model is improving, when compared to generations from the base model on the same test conversations.

These dataset files must be uploaded to OpenAI’s file endpoint. Make sure to add your OpenAI API key to your system environment variables for authentication.

Locai
Written by
Read More by Author