
Ollama, an AI model serving system, has released version 0.2 with new features including concurrency by default, running different models simultaneously, and processing multiple requests in parallel. This update allows for dynamic loading and unloading of models based on available memory, optimizing system performance.







🚀 Exciting news! Following GLM-4, codegeex also supports ollama🎉 This high-performance multi-language code generation model is perfect for code completion and AI development. 🌐👨💻✨ Ollama:https://t.co/zXVmQKY2Cq
Soft capping now merged in flash attention. This means you can now do inference and fine tune Gemma2 without some of the issues observed earlier. https://t.co/3wYN0uNVFh
And guess what? You can run tasks and crews in parallel now using local models through @ollama 😎👉👉 Major release from their team! congratz! https://t.co/B2S24wtUjp