Recent advancements in local language models (LLMs) have significantly enhanced their capabilities and accessibility. The latest open-source Llama 3.2 model can now be used to chat with PDF files through a Retrieval-Augmented Generation (RAG) approach. This model, along with others like Qwen 2.5 72B and Llama 3.1 70B, can be run locally on devices such as Apple Silicon Macs, ensuring privacy and offline functionality. The LM Studio 0.3.4 update, which includes Apple MLX, allows Llama 3.2 to run at approximately 250 tokens per second on the new M3 chip. Additionally, the new SOTA visual document retriever, ColPali, enhances the ability to perform multimodal RAG with complex PDFs and slide decks. The update also supports structured JSON responses and integrates ColQwen2 for improved performance.
Local Models ftw! Chat with your PDFs with Llama 3.2 on your Mac at the click of a button 🔥 100% local, fully private - powered by llama.cpp ⚡ This is on top of free access to pre-deployed LLMs like Qwen 2.5 72B, Llama 3.1 70B, Command R+ and many more! Download today!! https://t.co/AUdZ4TrNUJ
New Blog: Document RAG with Llama 3.2 Vision and ColQwen2! We discuss: - How to perform RAG with complex PDFs & slide decks - How Llama 3.2 vision models can be used for multimodal RAG - The new SOTA visual document retriever ColPali https://t.co/akluzGuwBL https://t.co/GwSeKagMGH
LM Studio 0.3.4 ships with Apple MLX 🚢🍎 Run on-device LLMs super fast, 100% locally and offline on your Apple Silicon Mac! Includes: > run Llama 3.2 1B at ~250 tok/sec (!) on M3 > enforce structured JSON responses > use via chat UI, or from your own code > run multiple… https://t.co/XQ32Bp0Fcv