Using Datasets for Fine-tuning

While Latitude focuses on prompt engineering and evaluation, the datasets you create and curate within the platform can be valuable assets for fine-tuning language models using external tools and services.

Why Use Latitude Datasets for Fine-tuning?

Curated Data: Datasets often contain carefully selected inputs and high-quality outputs (either expected outputs or actual model responses reviewed manually).
Real-World Examples: Datasets created from production logs represent actual user interactions.
Structured Format: Latitude datasets are already in a structured format (CSV), making them easier to process for fine-tuning.

Exporting Datasets from Latitude

Navigate to the “Datasets” section in your project.
Locate the dataset you want to use for fine-tuning.
Find the option to Download or Export the dataset (usually represented by a download icon).
Save the resulting CSV file to your local machine.

Preparing Data for Fine-tuning

Once exported, you’ll likely need to transform the CSV data into the specific format required by your chosen fine-tuning platform or library (e.g., OpenAI’s JSONL format, Hugging Face datasets format). Common steps include:

Selecting Columns: Identify the columns containing the input prompt/context and the desired completion/output.

Formatting: Convert each row into the required structure. For example, for OpenAI fine-tuning, you might create JSON objects like:

{"prompt": "<Input from CSV column A>", "completion": "<Output from CSV column B>"}
// or for chat models:
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "<Input>"}, {"role": "assistant", "content": "<Output>"}]}

Data Cleaning: Review the data for quality, consistency, and remove any low-quality or irrelevant examples.
Splitting Data: You might need to split your exported dataset into training and validation sets.

Consult the documentation of your specific fine-tuning tool or platform for detailed formatting requirements.

Example Scenario

Imagine you have a Latitude dataset created from manually reviewed chat logs (input_query, high_quality_response).

Export: Download this dataset as a CSV from Latitude.
Transform: Write a script (e.g., Python with pandas) to read the CSV and convert each row into the JSONL format required by the fine-tuning API you plan to use.
Fine-tune: Upload the formatted JSONL file and run the fine-tuning job using the provider’s tools.
Evaluate: After fine-tuning, you can even evaluate the new model’s performance back in Latitude by configuring it as a new provider/model and running evaluations against your datasets.

By leveraging the data curation work done in Latitude, you can streamline the preparation process for fine-tuning models for specialized tasks.

Next Steps

Learn about Creating and Using Datasets in Latitude.
Refer to external documentation for specific fine-tuning platforms (OpenAI, Hugging Face, etc.).

Overview

Getting started

Prompts

Agents

Evaluations

Datasets

Experiments

Deployment

Self-Hosting

Support

Using Datasets for Fine-tuning

Why Use Latitude Datasets for Fine-tuning?

Exporting Datasets from Latitude

Preparing Data for Fine-tuning

Example Scenario

Next Steps

Overview

Getting started

Prompts

Agents

Evaluations

Datasets

Experiments

Deployment

Self-Hosting

Support

Documentation Index

​Why Use Latitude Datasets for Fine-tuning?

​Exporting Datasets from Latitude

​Preparing Data for Fine-tuning

​Example Scenario

​Next Steps

Why Use Latitude Datasets for Fine-tuning?

Exporting Datasets from Latitude

Preparing Data for Fine-tuning

Example Scenario

Next Steps