Groq has launched its first multi-modal endpoint, hosting the LLaVA v1.5 7B model which supports image, audio, and text inputs. This development allows developers and businesses to create innovative applications that combine visual, auditory, and textual data. Initial benchmarking indicates that Groq's response times are more than four times faster than GPT-4 on OpenAI. The new model, available in API or console as 'llava-v1.5-7b-4096-preview', is expected to enhance the capabilities of GroqCloud significantly, bringing insane inference speeds to vision models.
LongLLaVA leverages Mamba-Transformer hybrid to process 1000 images in an 80GB GPU, efficiently, enhancing long-context MLLMs capabilities. ----- **Results** 📊: • Outperforms open-source models on MileBench, surpassing Claude3-Opus • Excels in retrieval tasks and video… https://t.co/KLamBdev2V
Groq is now multimodal Available on their cloud developer console: https://t.co/65CWJq0xnx
whoa @GroqInc now is multi-modal!! https://t.co/7Iy1TslBjv