DeepNewz, mobile.

People-sourced. AI-powered. Unbiased News.

Screenshot of DeepNewz app showing story detail view.

Screenshot of DeepNewz app showing story list view.

Search

For You

Sources

Loading...

Additional media

Loading...

Similar Stories

Footer

AI

AI Fundraising
AI Modeling
AI Products

Business

Automobile
Company Earnings
Economics
Law
Prediction Markets
Real Estate
Rumors
Stocks
VC

Crypto

Airdrops
Blockchains
CBDCs
DeFi
Hacks
Markets
Memecoin
Mining
NFT
Regulation

Culture

Anime
Celebrities
Crime
Education
Movies
Music
Obituary
TV
Video Games

Environment

Climate
Energy
Natural Disasters
Natural Resources
Sustainability

Politics

DOGE
Epstein Files
Executive Orders
Tariffs
US Domestic Policy
US Elections
US Foreign Policy
US Gov Appointments
US Judiciary
US Legislation
War

Science

Archeology
Bio
Health
Math
Pharma
Physics
Space

Sports

Boxing
Chess
Cricket
Golf
Hockey
MLB
NBA
NCAA
NFL
Olympics
PGA
Poker
Racing
Rugby
Soccer
Tennis
UFC

Tech

AR-VR
Fintech
Infosec
IoT
Metaverse
Policy
Robotics
Smart Home
Software
Startups
Wearables

United States

Arizona
Boston
California
Chicago
Colorado
Detroit
Florida
Georgia
Las Vegas
Los Angeles
New Jersey
New Mexico
New York
Ohio
Oregon
Philadelphia
San Francisco
Seattle
Texas
Utah
Washington DC

World

Terms of Service

WhatsApp YouTube X

© 2025 DeepNFTValue, Inc. All rights reserved.

Rohan Paul@rohanpaul_ai
3 months ago
OpenAI Whisper deployment on @huggingface Inference Endpoints, achieving up to 8x faster transcriptions. → The new Whisper endpoint utilizes vLLM for inference, targeting NVIDIA GPUs (Ada Lovelace like L4 & L40s). Optimizations include PyTorch compilation (torch.compile) for https://t.co/0HiyCWaM7G https://t.co/5BhmdKnloZ
Jeff Boudier 🤗@jeffboudier
3 months ago
Transcribing 1 hour of audio for less than $0.01 🤯 The @huggingface team cooked with 8x faster Whisper speech recognition - @OpenAI whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU! https://t.co/JvEjIlH8bL
Florent Daudens@fdaudens
3 months ago
Wow, 8x faster transcription with Whisper Large V3! https://t.co/e6hWF0jOyH

Image #1 for story hugging-face-deploys-whisper-large-v3-8x-faster-transcriptions-0-80-hr-on-nvidia-85d8d3cd

Image #2 for story hugging-face-deploys-whisper-large-v3-8x-faster-transcriptions-0-80-hr-on-nvidia-85d8d3cd

Image #3 for story hugging-face-deploys-whisper-large-v3-8x-faster-transcriptions-0-80-hr-on-nvidia-85d8d3cd

May 13, 04:15 PM

Hugging Face Deploys Whisper Large V3 for 8x Faster Transcriptions at $0.80/hr on NVIDIA GPUs; PortkeyAI Optimizes Gateway Latency

Hugging Face Deploys Whisper Large V3 for 8x Faster Transcriptions at $0.80/hr on NVIDIA GPUs; PortkeyAI Optimizes Gateway Latency

Authors

6

Hugging Face has introduced a new deployment of OpenAI's Whisper speech recognition model on its Inference Endpoints platform, achieving transcription speeds up to eight times faster than previous versions. This enhancement leverages the vLLM project for optimized inference on NVIDIA GPUs such as the Ada Lovelace L4 and L40s, incorporating PyTorch compilation techniques. The upgraded Whisper Large V3 model transcribes audio at 100 times real-time speed on a $0.80 per hour L4 GPU, enabling the transcription of one hour of audio for less than one cent. These improvements significantly reduce latency and cost without requiring any configuration changes from users. Additionally, PortkeyAI has rolled out optimizations on its Gateway that also reduce latency for OpenAI embeddings, further enhancing response times.

#Hugging Face #OpenAI #Whisper #Inference Endpoints #NVIDIA #Ada Lovelace L4 #PyTorch #Whisper Large V3 #PortkeyAI #Gateway

Written with ChatGPT (GPT-4).

Similar Stories

Alibaba Launches Non-Reasoning Qwen3-235B-A22B Model on Hugging Face With MoE Architecture; Boson AI Debuts Higgs Audio V2 TTS

Authors

7

14 days ago

OpenAI’s GPT-OSS 20B and 120B Models Reach Up to 1.5M Tokens/Sec on Groq, NVIDIA, and Cerebras with New Pricing

Authors

9

9 hours ago

Groq Hosts Kimi K2 LLM at 185 Tokens a Second

Authors

13

22 days ago

Zhipu AI Launches GLM-4.5 and GLM-4.5 Air Open-Source Models With MoE Architecture, MIT License, and Competitive API Pricing

Authors

32

7 days ago

China’s Z.ai Debuts Open-Source GLM-4.5 Model, Slashing AI Usage Costs

Authors

31

8 days ago

Alibaba Unveils Open-Source Wan 2.2 Model, Driving Down Video-AI Costs

Authors

13

3 days ago

Hume AI Unveils EVI 3 for Lifelike Voice Cloning

Authors

11

20 days ago

Hugging Face Releases SmolLM3, 3B-Parameter Multilingual Model with 128K Context and Dual-Mode Reasoning

Authors

5

28 days ago

CoreWeave Outpaces Rivals With First Nvidia Blackwell Ultra Deployment

Authors

30

29 days ago

Decart AI Launches MirageLSD with Under 40ms Latency; Elon Musk’s Grok AI Rolls Out Imagine Video Generation Beta

Authors

20

8 days ago

AI /ChatGPT Features