OpenAI has introduced audio input and output capabilities to its Chat Completions API. This new feature allows users to pass text or audio inputs and receive responses in text, audio, or both, making it ideal for asynchronous audio applications. Unlike the Realtime API, which is better suited for low-latency speech-to-speech interactions, the Chat Completions API is designed for multimodal applications that do not require real-time responses. The pricing for this new feature is noted to be relatively high. The GPT-4o API now also supports audio in and out. A live stream about this update is scheduled.
chat completions now support audio for async audio experiences! 🚀returns audio, text or both! live stream about this in about an ~hour or so. keep an eye out Link to doc in comment https://t.co/dTvrZk0XCF
https://t.co/KB18e2rL2Z “The Chat Completions API supports audio now. Pass text or audio inputs, then receive responses in text, audio, or both.” This is different than the Realtime API.
Breaking - you can now send and receive audio from the chat completions API for @OpenAI 👏 Unlike RealTime audio, this is well suited for multimodal applications that do not require real time, but also the input is incredible! Pricing is... not cheap https://t.co/JhTkudewyK