Claude allows [@anthropicPromptCachingClaude2024] users to cache frequently used contexts, such as conversations, coding snippets, questions about large documents or detailed instructions, between API calls to reduce costs by up to 90% and latency by up to 85%.

OpenAI provides a similar feature called Memory, which works similarly to Claude’s prompt cache under the hood. Instead of explicitly caching user prompts, it searches past conversations, documents, and interactions between users and ChatGTP. It extracts relevant information, feeding into GPT models as part of the prompt context.

Both mechanisms are similar to the LangChain concept of memory, albeit more integrated and slightly more sophisticated.

References