How to use OpenAI models to your specific segment of data and application domain.
Introduction

Hello everyone! I’m happy to write a new post to share the new thinghs I learned last month. We had a company hackathon and in this article I’ll explain how I used the Fine Tuning service to train a GPT-3 base model.
To fully understand this article, you need to know some basic concepts of Artificial Intelligence. You may want to read my first article Basic concepts of machine learning and artificial intelligence first.
Nice, let’s start this journey!
Contextualization
OpenAI is a company focused in ML and AI research, they are very popular because of ChatGPT and Dall E — you definitely happen to know them. ChatGPT is a generative AI, which means it can generate content and answer questions based on a lot of data collected on the internet about many subjects. But, sometimes you need a tuned model, which could reply/classify data related to your product domain context.
This was exactly my idea for the hackathon. At WhyLabs we have some interesting services for monitoring datasets and machine learning models. We provide a set of metrics to monitor your data and you can define different analyzers to send you notifications when you get an anomaly based on your definitions.
For example, we have the metric count_null_ratio. This metric is responsible for tracking the percentage of null value inputs in your columns. You can have a regression model to predict your customer’s credit score and select all the columns you want to monitor and define a threshold for the ratio. You can set up a monitor to display an anomaly when the ratio is greater than 10%, helping you to investigate and understand your issues.
Okay, now you have an overview of the context, I’ll elaborate the idea!
The challenge
I’m not a machine learning expert, so I found it difficult to understand and remember all the metrics — there are many. It’s easy to understand the name if you have the description like I have above. So why not reverse the process? You tell me what you need, and I’ll tell you the right metric. Sounds really helpful, right? Let’s get to the fun part.
Proof of Concept
For my proof of concept i choose 3 of our metrics:
- Count Null Ratio: “1#both#count_null_ratio”
- Estimated Unique Values: “2#both#unique_est”
- Classification Accuracy: “3#model#classification.accuracy”
You already know about count null, I’ll provide you a quick explanation of the others.
Classification Accuracy
This metric is related to the performance of classification models, it’s an important information to understand if the predictions are correct. This metric is a number calculated by dividing the number of correct predictions by the number of correct and incorrect predictions.Estimated Unique Values
This metric is used in discrete columns (categorized data) and helps you estimate the number of different categories in a single column.
Dive into Fine Tuning
With the Fine Tuning API, you can select a ChatGPT base model (ada, curie, babbage, davinci) and upload training data to add completions for a prompt. The completion will be our classes here. The pattern I used to create them looks like `3#model#classification.accuracy`. The OpenAI tool recommends starting a class with a different token — a token is a piece of text, you can check the OpenAI definition here. In my case, I’m starting with a different number and using “#” to wrap the type of resource this metric can be used on (some metrics are either exclusive to models or datasets) and then the metric itself.
My idea was to handle the class with regex in my app to extract the metric name and the resource type to validate the usage.
{"prompt": "I want see the percentage of columns that have null values\n\n###\n\n", "completion":" 1#both#count_null_ratio\n"}
{"prompt": "I want to analyze the unique values in my discrete columns\n\n###\n\n", "completion":" 2#both#est_unique\n"}
{"prompt": "I want to understand if my model have a good true predictions rate\n\n###\n\n","completion":" 3#model#classification.accuracy\n"}
*Note that the OpenAI documentation suggests some patterns, such as adding a unique token at the end of each prompt — the recommended default is “\n\n###\n\n”. By the way, we should start the completion values with a space and end them with a ‘\n’.
These were my examples, but of course we need at least a few hundred training examples for each class to make it work and perform well. So you probably don’t want to write all of them manually, this data could be scrapped for your application.
In my case, for proof of concept, I just want a bunch of ways to ask the metric. So I used ChatGPT to generate a hundred similar phrases for each of them. After a few minutes, I had a training dataset of 150 prompts, which was enough to demonstrate.
With my JSONL dataset ready, I jumped to building and training the model.
The training
We can use any tool you prefer to run Python, I’m using Google Colab here.
First let’s add the dependencies and define our apiKey:
!pip install --upgrade openai
import os
import openai
%env OPENAI_API_KEY=yours_open_ai_key
Now we can run a tool to prepare our data. This tool will correct the suggestions, remove duplicate lines, and split your data into a training and validation dataset.
!openai tools fine_tunes.prepare_data -f hackathon_data/train_data.jsonl
With the prepared data sets, we can easily create our fine-tuning model by specifying the base model and the number of classes — 3 in my case.
!openai api fine_tunes.create -m curie -t "hackathon_data/train_data_prepared_train.jsonl" -v "hackathon_data/train_data_prepared_valid.jsonl" --compute_classification_metrics --classification_n_classes 3
*This command will open a stream to update the job status, but you can close it if you want, it will not stop the job processing.
We need to wait a few minutes to get the model name. The time will depend on the base model and the size of the datasets. In my case it was something between 5–10 minutes.
We can run the following command to list your models. If the “fine_tuned_model” property has a null value, you will have to wait a little longer and run it again later.
!openai api fine_tunes.list
When you get the name, you can start sending prompts and get you predictions! Yeah, I know, it’s easier then writing a hello word in JAVA. — Just kidding, Java fans. Love y’all ❤
Testing
#fine_tuned_model name
MODEL = "davinci:ft-personal-2023-04-26-22-20-54"
#test prompt text + end token `\n\n###\n\n`
PROMPT = "How well is my model performing\n\n###\n\n"
openai.api_key=%env OPENAI_API_KEY
res = openai.Completion.create(engine=MODEL, prompt=PROMPT, max_tokens=10, temperature=0, stop='\n')
res['choices'][0]['text']
*Max_tokens is a parameter to set the max of tokens in the completion, so it will depends on your class text length. You can set a stop token for the completion, ‘\n’ if you used it like recommended.

My work didn’t end here, it was just beginning. I built a new UI to help users set up the data and type the prompt to get the metrics. I integrated it into our React app using the NodeJS lib, but this is a common UI work, the magic works in the ML side :)
That’s all folks. Thank You!
Matheus Mendes — Linkedin
Frontend Software Engineer