Jina AI has announced the release of two new Small Language Models (SLMs), Reader-LM-0.5B and Reader-LM-1.5B. These models, inspired by Jina Reader, are specifically designed to convert raw HTML into clean markdown efficiently. The Reader-LM models support multiple languages and can handle long contexts. They have outperformed other prominent models like GPT-4, Gemini-1.5, LLaMA-3.1-70B, and Qwen2-7B-Instruct in HTML-to-Markdown tasks. Reader-LM-0.5B has 494 million parameters, while Reader-LM-1.5B has 1.54 billion parameters. The models also excel in web data extraction and cleaning.
Jina AI has released Reader-LM, a small language model designed to efficiently convert raw HTML into clean markdown while addressing challenges in maintaining performance and multilingual support. https://t.co/2raO729BeF https://t.co/9IZ3y5XNAk https://t.co/1V0ySa4zpT
Jina AI Released Reader-LM-0.5B and Reader-LM-1.5B: Revolutionizing HTML-to-Markdown Conversion with Multilingual, Long-Context, and Highly Efficient Small Language Models for Web Data Processing https://t.co/TnGvlmTpo5 #JinaAI #ReaderLM #HTMLtoMarkdown #LanguageModels #WebDa… https://t.co/En2jRX6WwO
Jina AI Released Reader-LM-0.5B and Reader-LM-1.5B: Revolutionizing HTML-to-Markdown Conversion with Multilingual, Long-Context, and Highly Efficient Small Language Models for Web Data Processing The release of Reader-LM-0.5B and Reader-LM-1.5B by Jina AI marks a significant… https://t.co/DOhBJsIWb5