Recent advancements in artificial intelligence have led to the introduction of several innovative methods aimed at enhancing video and image processing capabilities. Notable developments include Video-P2P, a tool for video editing using text prompts, and DeiT-LT, which improves training for Vision Transformers on imbalanced datasets. Additionally, HiGen creates realistic videos from text prompts by decoupling spatial and temporal factors. Other significant contributions include EventPS, a method for real-time photometric stereo using event cameras, and InstructVideo, which utilizes human feedback to enhance video generation quality. The introduction of MetaCloak aims to protect images from unauthorized AI-based synthesis, while VCoder enhances object recognition in images by integrating additional perception inputs. These innovations reflect a growing trend in multimodal prompting for AI image generation, which allows users to control various aspects of generated content, including character poses and color grading. The research community continues to explore methods that improve the efficiency and effectiveness of AI in handling complex visual tasks.
SFOD: Spiking Fusion Object Detector TLDR: Event cameras capture images differently than regular cameras. ✨ Interactive paper: https://t.co/nwLM1XWK4r Get paper code, content, Q&A, and more on @Bytez 🚀
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimod... TLDR: Recent advancements in multimodal models have shown that even advanced systems struggle with simple visual questions due to inaccurate visual grounding. ✨ Interactive paper: https://t.co/XtIb6IocrR
D^4: Dataset Distillation via Disentangled Diffusion M... TLDR: A new method called Dataset Distillation via Disentangled Diffusion Model (DM) is introduced for creating smaller datasets from large ones for faster training. ✨ Interactive paper: https://t.co/7rDC7z812g