
Recently, deepset_ai and mixedbreadai collaborated to release a SOTA German text embedding model. This new model outperforms existing models like multilingual-e5-large and jina-embeddings-v2-base-de. The model, with 478 million parameters, is small enough to run on both CPU and GPU. A livestream discussing the creation of this model, focusing on Binary Quantization and MRL, is scheduled for August 7th.
[IR] LitSearch: A Retrieval Benchmark for Scientific Literature Search https://t.co/EU8ylWfjnD - LitSearch is a new retrieval benchmark for scientific literature search, consisting of 597 realistic literature search queries. - The queries are created in two ways: 1)โฆ https://t.co/TudVxAzTkk
Check out our new retrieval benchmark! We curated a large set of challenging questions about the recent ML literature and evaluated SOTA retrievers, Google, and more! Looking forward to seeing the next generation of retrieval systems support scientific research! ๐งโ๐ฌ https://t.co/PntEfNCpMj
Just over a week ago, @deepset_ai and @mixedbreadai jointly released a new embedding model. Join me and @aaxsh18 on August 7th for a livestream about how this model was created, with a special focus on Binary Quantization and MRL ๐ (PS: first livestream Iโm hosting!)โฆ