Hugging Face partners with NVIDIA to democratise AI inference

Hugging Face has joined forces with NVIDIA to bring inference-as-a-service capabilities to one of the world’s largest AI communities. This collaboration, announced at the SIGGRAPH conference, will provide Hugging Face’s four million developers with streamlined access to NVIDIA-accelerated inference on popular AI models.

The new service enables developers to swiftly deploy leading large language models, including the Llama 3 family and Mistral AI models, with optimisation from NVIDIA NIM microservices running on NVIDIA DGX Cloud. 𝘛𝘩𝘪𝘴 𝘪𝘯𝘵𝘦𝘨𝘳𝘢𝘵𝘪𝘰𝘯 𝘢𝘪𝘮𝘴 𝘵𝘰 𝘴𝘪𝘮𝘱𝘭𝘪𝘧𝘺 𝘵𝘩𝘦 𝘱𝘳𝘰𝘤𝘦𝘴𝘴 𝘰𝘧 𝘱𝘳𝘰𝘵𝘰𝘵𝘺𝘱𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘰𝘱𝘦𝘯-𝘴𝘰𝘶𝘳𝘤𝘦 𝘈𝘐 𝘮𝘰𝘥𝘦𝘭𝘴 𝘩𝘰𝘴𝘵𝘦𝘥 𝘰𝘯 𝘵𝘩𝘦 𝘏𝘶𝘨𝘨𝘪𝘯𝘨 𝘍𝘢𝘤𝘦 𝘏𝘶𝘣 𝘢𝘯𝘥 𝘥𝘦𝘱𝘭𝘰𝘺𝘪𝘯𝘨 𝘵𝘩𝘦𝘮 𝘪𝘯 𝘱𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘦𝘯𝘷𝘪𝘳𝘰𝘯𝘮𝘦𝘯𝘵𝘴.

For Enterprise Hub users, the offering includes serverless inference, promising increased flexibility, minimal infrastructure overhead, and optimised performance through NVIDIA NIM. This service complements the existing Train on DGX Cloud AI training service available on Hugging Face, creating a comprehensive ecosystem for AI development and deployment.

The new tools are designed to address the challenges faced by developers navigating the growing landscape of open-source models.

By providing a centralised hub for model comparison and experimentation, Hugging Face and NVIDIA are lowering the barriers to entry for cutting-edge AI development. Accessibility is a key focus, with the new features available through simple “Train” and “Deploy” drop-down menus on Hugging Face model cards, allowing users to get started with minimal friction.

At the heart of this offering is NVIDIA NIM, a collection of AI microservices that includes both NVIDIA AI foundation models and open-source community models. These microservices are optimised for inference using industry-standard APIs, offering significant improvements in token processing efficiency – a critical factor in language model performance.

𝗧𝗵𝗲 𝗯𝗲𝗻𝗲𝗳𝗶𝘁𝘀 𝗼𝗳 𝗡𝗜𝗠 𝗲𝘅𝘁𝗲𝗻𝗱 𝗯𝗲𝘆𝗼𝗻𝗱 𝗺𝗲𝗿𝗲 𝗼𝗽𝘁𝗶𝗺𝗶𝘀𝗮𝘁𝗶𝗼𝗻. When accessed as a NIM, models like the 70-billion-parameter version of Llama 3 can achieve up to 5x higher throughput compared to off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems. This performance boost translates to faster, more robust results for developers, potentially accelerating the development cycle of AI applications.

Underpinning this service is NVIDIA DGX Cloud, a platform purpose-built for generative AI. It offers developers scalable GPU resources that support every stage of AI development, from prototype to production, without the need for long-term infrastructure commitments. This flexibility is particularly valuable for developers and organisations looking to experiment with AI without significant upfront investments.

As AI continues to evolve and find new applications across industries, tools that simplify development and deployment will play a crucial role in driving adoption. This collaboration between NVIDIA and Hugging Face empowers developers with the resources they need to push the boundaries of what’s possible with AI.

Leave a Reply Cancel reply

Search by posts

Categories

Recent posts

Brains

Contact Us