Apart from the ChatGPT that made Gen AI accessible to the masses, OpenAI provides many powerful foundational models through its API endpoint, allowing users to build Gen AI into their own applications. OpenAI API SDK is available in JavaScript (node.js) and Python.

GPT-3.5 Turbo

GPT-3.5 Turbo models are for multi-turn conversations and are equally capable of single-turn text-completion tasks. They support a 16k context window by default.

Because the models have no memory of the messages from the previous requests, it is necessary to save the conversations and include them in the consequent requests. If the conversation exceeds the maximum token size of the model, it must be shortened or use the gpt-4-32k model, which can handle up to 32k tokens.

GPT-4 Turbo

GPT-4 Turbo is a multimodal model that can accept text and images and output text. Under the hood, it encodes text and images into the same encoding space and processes the data from different sources through the same neural network. It has a 128k context window (as of 21/11/2023).

GP4- Turbo with vision, code-named gpt-4-vision-preview, is the GPT-4 model that can accept one or more images as input and answer questions about them. It can analyse images in detail and read documents with figures.

Although OpenAI did not disclose the details of gpt-4-vision-preview, the common knowledge about multimodal LLMs suggests it uses CLIP.

Whisper

Whisper is a general-purpose speech recognition model that is capable of multilingual recognition, translation and identification.

Code Interpreter

Code Interpreter is a GPT model that can execute Python code in a sandbox and generate charts, graphs, data files or PDFs. For example, you can ask GPT-4 to โ€˜write a Python function to analyse a data file and generate a chart to find the trendโ€™, and the generated code can be fed to the Code Interpreter model to derive the desired results.

DALLยทE 3

DALL-E 3 can generate images based on user prompts. It is available in ChatGPT Plus, ChatGPT Enterprise and as dall-e-d through the OpenAI API endpoint.

Sora

Although there are no details from OpenAI on how they built Sora, the new model raised the bar in detail and the realism of AI video generation models.

Sora is a text-to-video model recently announced by OpenAI. It can produce extraordinarily realistic high-definition videos up to one minute. It is currently being tested by OpenAI red teamers to evaluate AI risks such as bias and harmful content. You can see examples of Sora-generated videos here.

Thanks to its deep understanding of languages, Sora can interpret text prompts accurately and generate imaginative videos that realistically reflect real-world physics and carry compelling emotions.

The model proves that diffusion transformers work well for videos, indicating that video generation could get as good as text generation with current technologies. The model is believed to be capable of learning physics and understanding the world to an extent. It is the GPT-3 moment of video models.

Under the guise of a video-generation model, this is another massive step towards AGI. It will likely propel AI adoption, raise deep concerns and disrupt industries.

Text-to-speech (TTS)

OpenAI provides two TTS models: tts-1 for real-time voice generation and tts-1-hd for high-quality voice generation.

Assistants API

Assistants API is OpenAIโ€™s version of AI Agent.

Agent is not a new concept in building Generative AI applications. LangChain Agent and AgentGPT are all well-known frameworks that have existed for a while now. They use LLMs to orchestrate complex tasks by breaking them down into small steps and passing the sub-tasks to other tools (LangChain Tools, OpenAI Tools) that specialise in specific tasks, especially the tasks that LLMs are known to be bad at, such as maths or analysing structured data. This allows LLMs to expand their abilities infinitely - at least in theory. Tools can be external expert models, purpose-built applications either by the frameworks or users, or knowledge retrievals The outputs of each tool are then processed by the agent LLM to arrive at the final response to users.

OpenAI Assistant API uses OpenAI models to execute three types of tools: Code Interpreter, Knowledge Retrieval and Function calling.

OpenAIโ€™s GPTs are a use case of the Assistants API.

Customer Models

OpenAI announced a Custom Models Program, allowing selected organisations to work with dedicated OpenAI teams to create domain-specific models.

Davinci (Legacy)

Updated @ 25/11/2023: Davinci models are considered legacy models (2020-2022.)

The text-davinci models are for single-turn text completion.

As of 15/03/2023, gpt-3.5-turbo and text-davinci-003 are on par in terms of capability & performance. However, the former is 10% of the price per token compared to the latter, and it should be used for most use cases.

*See the complete list of OpenAI Models.