Google has unveiled PaliGemma 2, its latest open-source family of vision-language models, combining SigLIP for vision encoding and Gemma 2 for text decoding. Available today in parameter sizes of 3 billion, 10 billion, and 28 billion—or 2 billion, 9 billion, and 27 billion—PaliGemma 2 supports image resolutions of 224x224, 448x448, and 896x896 pixels. Designed for easy fine-tuning across various vision-language tasks, the models offer advanced features like generating detailed, context-sensitive image descriptions and improved performance in applications such as chemical formula recognition, music score interpretation, spatial reasoning, and chest X-ray report generation. PaliGemma 2 is accessible on platforms like Kaggle Models and HuggingFace Transformers, facilitating widespread adoption by developers and researchers.
Google releases PaliGemma 2, its latest open source vision language model @bimedotcom @Khulood_Almani @theomitsa @FmFrancoise @sulefati7 @NathaliaLeHen @IanLJones98 @bamitav @sallyeaves @BetaMoroney @sonu_monika @TheAIObserverX https://t.co/Em4zEfZjiN
Google has released the next generation of its open-source vision language model, PaliGemma 2. The new model combines improved image description capabilities with improved performance across multiple applications. https://t.co/jTyk195Ngx
I listened for 20,930 minutes in 2024. What about you? #SpotifyWrapped https://t.co/Tc6mpkRXOn https://t.co/a1Phx1UGeA