OpenAI has introduced a new feature called prompt caching, which aims to reduce the costs and latency associated with input tokens. This feature is now available on TypingMind and applies automatically to the latest versions of GPT-4o, GPT-4o mini, o1-preview, and o1-mini models. Prompt caching provides a 50% discount on cached inputs and significantly reduces latency by 80% by reusing recently seen input tokens. The feature does not require any API changes and has a cache retention time of 5-10 minutes with a hard expiration of one hour. This innovation follows a similar feature released by AnthropicAI, which can save up to 90% on input tokens and reduce latency by 85%. The announcement was made during the OpenAI dev day.
🚀 Prompt Caching: OpenAI vs Claude 🧠 OpenAI - Available for GPT-4o and o1 models - Reuses input tokens across API calls, 50% discount on inputs - Automatic application, no API changes needed - 5-10 minute cache retention, 1 hour hard expiration - GPT-4o: $2.50 uncached, $1.25… https://t.co/LNn9OweDjL
At #devday this week @OpenAI released prompt caching - which reduces input token costs by 50% and latency by 80% 🤯 Not too long ago, @AnthropicAI released its version of prompt caching, which can save up to 90% on input tokens and reduce latency by 85% 🚀 This is a huge deal.… https://t.co/Rnp9hJUByg
catch up on all the action at https://t.co/vuXXPxqrrV + prompt caching saves you 50% on cached inputs over uncached inputs and reduces latency by reusing recently seen input tokens + realtime api: create low-latency speech-to-speech experiences + model distillation to train… https://t.co/daLuC4lnnV