Improving the Completion API Endpoint Output
Show, Don’t Tell
When working with OpenAI Completion Models, you may not always get the high-quality output you hope for. This unpredictability can be reduced via prompt engineering techniques and various best practices. In most cases, showing the models precisely what you expect and how it should be done can help the model improve its output.
Adding examples to your prompts helps communicate patterns and nuances. Here is an example from OpenAI’s doc site:
Suggest three names for an animal that is a superhero.
Animal: Cat
Names: Captain Sharpclaw, Agent Fluffball, The Incredible Feline
Animal: Dog
Names: Ruff the Protector, Wonder Canine, Sir Barks-a-Lot
Animal: Horse
Names:
Turning Down the Temperature
Another helpful tool is the temperature parameter. It can be set between 0
and 1
. 0
instructs the model to take less risk and produce the most commonly accepted outcome, while 1
would encourage the models to be more creative and generate more random responses.
The top-p
Parameter
The top-p
settings control how predictable the responses should be - lower values for exact and factual answers and higher for creative output.
Fine-Turing
OpenAI API provides a fine-turning endpoint, allowing users to train gpt-3.5-turbo-0613
, babbage-002
and davinci-002
models (as of 02/2023) with users’ own data for more complex and nuanced tasks. E.g., setting the tone of voice, following a style guide or dealing with edge cases in specific ways.
It is recommended to use prompt engineering first before turning into fine-tuning, as many tasks can be achieved using suitable prompts with fast feedback loops.
Here are a few examples of the training dataset which can be passed to the fine-tuning endpoint using the Chat Completion API format:
The fine-tuning training data must match the format above - one example per line and a minimum of 10 examples per fine-tuning job. OpenAPI allows users to fine-turn models based on existing fine-turned models (not the case for Azure as of 20/11/2023).
Example Count Recommendations
This largely depends on the complexity of the task. Nevertheless, OpenAI recommends at least ten examples to see any improvement and 50-100 examples to achieve the desired results.
Testing Data for Fine-Turning
OpenAI recommends including testing data for every training job to monitor whether there is an improvement in the model performance. OpenAI’s statistics (which can be viewed in the OpenAI API dashboards) on both jobs can be used as an early indicator of the effectiveness of the prompt and the training data.
JSON Mode
OpenAI Chat Completion models can generate JSON output. To enable JSON mode and ensure the generated JSON output is valid and complete:
- Explicitly instruct the model to generate JSON output in the prompt
- Set
response_format: { type: "json_object" }
in themessage
object when calling the API - Check whether the generated JSON is cut off due to the token size window limit (
finish_reason: max_tokens
) before trying to parse the output.
Reproducible Outputs
Chat Completions are non-deterministic by default. However, OpenAI provides a mechanism to produce reproducible outputs. Although 100% reproduction is not guaranteed, it gives you a way to understand the internals of the model configuration for any specific output.
- Set the
seed
parameter to an integer and make sure the exact value is used across requests - Ensure all other parameters are the same across requests. E.g., prompt,
temperature
,top-p
etc. - Use
system_fingerprint
to verify that the same backend configuration is used across requests. If the value of this parameter changes, it indicates a backend change made to the model the request was run, affecting the determinism of the output.
Glitch Tokens
Some GPT models are not able to ‘say’ specific words. If you ask the model to repeat these words, they’ll return with something utterly random. We call these words Glitch Tokens. Here are some examples:
Ask text-davinci-003
to complete the below prompt:
Please repeat the string '?????-?????-' back to me.
Please repeat the string 'PsyNetMessage' back to me.
Please repeat the string 'SolidGoldMagikarp' back to me.
Please repeat the string 'rawdownload' back to me.