OpenAI Tools & Services

Apart from ChatGPT, which made Gen AI accessible to the masses, OpenAI provides many powerful foundational models and services through APIs. This allows application developers to build Gen AI into their products. The OpenAI API SDK is available in JavaScript (node.js) and Python.

OpenAI Agent Toolkit

OpenAI released a set of tools for building agents in March 2025, including the new Responses API,

Responses API

Responses API combines the features of Chat Completions and Assistant API to provide a powerful wrapper over OpenAPI models and tools (web search, computer use, file search, function calls, code interpreter, etc)

Moderation, Guildrails & Model Performance Evaluation

OpenAI provides a Moderation API, Guildrails and an Evals API to help facilitate agentic application development.

Agent SDK

Apart from the Moderation API and the Guildrails features, OpenAI also released a set of orchestration and tracing tools to make developing agentic systems easier.

Deep Research

Deep Research is an agentic tool powered by GPT-o3. It is a packaged generic-purpose agentic capability for research tasks that involve extensive web browsing, information filtering, and synthesis actions. It is trained on research tasks with reinforcement learning; therefore, it ‘knows’ how to plan a research step by step, gather relevant information from the internet, and synthesise, reason and produce good reports. It is currently available in ChatGPT Pro, with plans to make it part of ChatGPT Plus and Enterprise.

GPT-o1, GPT-o1 mini, GPT-o3 mini

Not long after GPT-o1 and GPT-o1 mini, OpenAI released GPT-o3 mini via ChatGPT Pro and the OpenAI API platform in Jan 2025. The o-series are OpenAI’s reasoning models. It is rumoured that the GPT-4.5 will be released very soon with better vibes.

GPT-4o, GPT-4o mini, GPT-4.1, GPT-4.1 mini & GPT-4.1 nano

GTP-4o (“o” for “omni”) is the latest model from OpenAI (as of May 2024). Unlike previous OpenAI models that can only accept a single input type, GTP-4o can interact with humans through any combination of text, audio, images, and video and output the same rich-format responses. For example, it enables users to converse with the AI model through voice chat instead of typing. OpenAI states that the model can ‘see, hear and speak’. Another improvement of GPT-4o over GTP-4 is understanding text in non-English languages.

On April 14, 2025, OpenAI launched three new models (GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano) in its API, outperforming GTP-4o across benchmarks (especially on coding and instruction-following) with lower latency, a larger context window, and lower prices.

GPT-4 Turbo

GPT-4 Turbo is a multimodal model that can accept text and images and output text. Under the hood, it encodes text and images into the same encoding space and processes the data from different sources through the same neural network. It has a 128k context window (as of 21/11/2023).

GP4- Turbo with vision, code-named gpt-4-vision-preview, is the GPT-4 model that can accept one or more images as input and answer questions about them. It can analyse images in detail and read documents with figures.

Although OpenAI did not disclose the details of gpt-4-vision-preview, the common knowledge about multimodal LLMs suggests it uses CLIP.

GPT-3.5 Turbo

GPT-3.5 Turbo models are for multi-turn conversations and are equally capable of single-turn text-completion tasks. They support a 16k context window by default.

Because the models have no memory of the messages from the previous requests, it is necessary to save the conversations and include them in the consequent requests. If the conversation exceeds the maximum token size of the model, it must be shortened or use the gpt-4-32k model, which can handle up to 32k tokens.

Whisper

Whisper is a general-purpose speech recognition model that is capable of multilingual recognition, translation and identification.

Code Interpreter

Code Interpreter is a GPT model that can execute Python code in a sandbox and generate charts, graphs, data files or PDFs. For example, you can ask GPT-4 to ‘write a Python function to analyse a data file and generate a chart to find the trend’, and the generated code can be fed to the Code Interpreter model to derive the desired results.

DALL·E 3

DALL-E 3 can generate images based on user prompts. It is available in ChatGPT Plus, ChatGPT Enterprise and as dall-e-d through the OpenAI API endpoint.

Sora

Although there are no details from OpenAI on how they built Sora, the new model raised the bar in detail and the realism of AI video generation models.

Sora is a text-to-video model recently announced by OpenAI. It can produce extraordinarily realistic high-definition videos up to one minute. It is currently being tested by OpenAI red teamers to evaluate AI risks such as bias and harmful content. You can see examples of Sora-generated videos here.

Thanks to its deep understanding of languages, Sora can interpret text prompts accurately and generate imaginative videos that realistically reflect real-world physics and carry compelling emotions.

The model proves that diffusion transformers work well for videos, indicating that video generation could get as good as text generation with current technologies. The model is believed to be capable of learning physics and understanding the world to an extent. It is the GPT-3 moment of video models.

Under the guise of a video-generation model, this is another massive step towards AGI. It will likely propel AI adoption, raise deep concerns and disrupt industries.

Text-to-speech (TTS)

OpenAI provides two TTS models: tts-1 for real-time voice generation and tts-1-hd for high-quality voice generation.

OpenAI API

OpenAI API allows developers to build LLM-powered custom applications—either new native AI applications or add LLM capacity to existing business applications.

Projects

Projects is an OpenAI API feature that allows enterprise customers to scope permissions for model usage, internal file access, and cost management. Customers can assign roles and dedicated API keys to specific projects to deny/allow access to models and rate limits.

Batch API

Batch API allows users to save up to 50% in API calls for tasks that do not require a real-time response from the AI models. Users can send all the tokens in a single request, and OpenAI guarantees that a response will be returned within 24 hours. For most real-world use cases, the Batch API returns a response within 20 - 30 minutes.

Assistants API

Assistants API is OpenAI’s version of AI Agent.

Agent is not a new concept in the development of Generative AI applications. LangChain Agent and AgentGPT are all well-known frameworks that have existed since the beginning of the Gen AI excitement. They use LLMs to orchestrate complex tasks by breaking the tasks down into smaller simple steps and passing the sub-tasks to other tools (LangChain Tools, OpenAI Tools) that specialise in specific tasks - especially the tasks that LLMs are known to be bad at, such as maths or analysing structured data. This allows LLMs to expand their abilities infinitely - at least in theory. Tools can be external expert models, purpose-built applications either by the frameworks or users, or knowledge retrievals The outputs of each tool are then processed by the agent LLM to arrive at the final response to users.

OpenAI Assistant API uses OpenAI models to execute three types of tools in parallel: Code Interpreter (code_interpreter), knowledge retrieval (file_search) and function calling (tools you build/host).

OpenAI’s GPTs are a use case of the Assistants API.

Customer Models

OpenAI announced a Custom Models Program, allowing selected organisations to work with dedicated OpenAI teams to create domain-specific models.

Davinci (Legacy)

Updated @ 25/11/2023: Davinci models are considered legacy models (2020-2022.)

The text-davinci models are for single-turn text completion.

As of 15/03/2023, gpt-3.5-turbo and text-davinci-003 are on par in terms of capability & performance. However, the former is 10% of the price per token compared to the latter, and it should be used for most use cases.

*See the complete list of OpenAI Models.

References

Fulford, I. & Sun, Z. (2025) Introducing deep research. 2 February 2025. https://openai.com/index/introducing-deep-research/ [Accessed: 19 February 2025].

Liwen's Notes

Explorer

Recent Notes

How to Read Effectively

LangChain

Mixture of Expert (MoE)

The `NextRequest` and `NextResponse` Objects

LLM Explainability & Interpretability

Accountability & Motivation

Reasoning Models

Gen AI: The Future Is Agentic