Hugging Face has joined forces with NVIDIA to bring inference-as-a-service capabilities to one of the worldโs largest AI communities. This collaboration, announced at the SIGGRAPH conference, will provide Hugging Faceโs four million developers with streamlined access to NVIDIA-accelerated inference on popular AI models.
The new service enables developers to swiftly deploy leading large language models, including the Llama 3 family and Mistral AI models, with optimisation from NVIDIA NIM microservices running on NVIDIA DGX Cloud. ๐๐ฉ๐ช๐ด ๐ช๐ฏ๐ต๐ฆ๐จ๐ณ๐ข๐ต๐ช๐ฐ๐ฏ ๐ข๐ช๐ฎ๐ด ๐ต๐ฐ ๐ด๐ช๐ฎ๐ฑ๐ญ๐ช๐ง๐บ ๐ต๐ฉ๐ฆ ๐ฑ๐ณ๐ฐ๐ค๐ฆ๐ด๐ด ๐ฐ๐ง ๐ฑ๐ณ๐ฐ๐ต๐ฐ๐ต๐บ๐ฑ๐ช๐ฏ๐จ ๐ธ๐ช๐ต๐ฉ ๐ฐ๐ฑ๐ฆ๐ฏ-๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐๐ ๐ฎ๐ฐ๐ฅ๐ฆ๐ญ๐ด ๐ฉ๐ฐ๐ด๐ต๐ฆ๐ฅ ๐ฐ๐ฏ ๐ต๐ฉ๐ฆ ๐๐ถ๐จ๐จ๐ช๐ฏ๐จ ๐๐ข๐ค๐ฆ ๐๐ถ๐ฃ ๐ข๐ฏ๐ฅ ๐ฅ๐ฆ๐ฑ๐ญ๐ฐ๐บ๐ช๐ฏ๐จ ๐ต๐ฉ๐ฆ๐ฎ ๐ช๐ฏ ๐ฑ๐ณ๐ฐ๐ฅ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ ๐ฆ๐ฏ๐ท๐ช๐ณ๐ฐ๐ฏ๐ฎ๐ฆ๐ฏ๐ต๐ด.
For Enterprise Hub users, the offering includes serverless inference, promising increased flexibility, minimal infrastructure overhead, and optimised performance through NVIDIA NIM. This service complements the existing Train on DGX Cloud AI training service available on Hugging Face, creating a comprehensive ecosystem for AI development and deployment.
The new tools are designed to address the challenges faced by developers navigating the growing landscape of open-source models.
By providing a centralised hub for model comparison and experimentation, Hugging Face and NVIDIA are lowering the barriers to entry for cutting-edge AI development. Accessibility is a key focus, with the new features available through simple โTrainโ and โDeployโ drop-down menus on Hugging Face model cards, allowing users to get started with minimal friction.
At the heart of this offering is NVIDIA NIM, a collection of AI microservices that includes both NVIDIA AI foundation models and open-source community models. These microservices are optimised for inference using industry-standard APIs, offering significant improvements in token processing efficiency โ a critical factor in language model performance.
๐ง๐ต๐ฒ ๐ฏ๐ฒ๐ป๐ฒ๐ณ๐ถ๐๐ ๐ผ๐ณ ๐ก๐๐ ๐ฒ๐ ๐๐ฒ๐ป๐ฑ ๐ฏ๐ฒ๐๐ผ๐ป๐ฑ ๐บ๐ฒ๐ฟ๐ฒ ๐ผ๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป. When accessed as a NIM, models like the 70-billion-parameter version of Llama 3 can achieve up to 5x higher throughput compared to off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems. This performance boost translates to faster, more robust results for developers, potentially accelerating the development cycle of AI applications.
Underpinning this service is NVIDIA DGX Cloud, a platform purpose-built for generative AI. It offers developers scalable GPU resources that support every stage of AI development, from prototype to production, without the need for long-term infrastructure commitments. This flexibility is particularly valuable for developers and organisations looking to experiment with AI without significant upfront investments.
As AI continues to evolve and find new applications across industries, tools that simplify development and deployment will play a crucial role in driving adoption. This collaboration between NVIDIA and Hugging Face empowers developers with the resources they need to push the boundaries of whatโs possible with AI.