Universal NIM Acceleration With GPU-Sharing Containers

Mar 28, 2025

Universal NIM Acceleration With GPU-Sharing Containers

Jeongkyu Shin

Founder / Researcher / CEO

Joongi Kim

Co-Founder / CTO

NVIDIA GTC

Mar 28, 2025

Universal NIM Acceleration With GPU-Sharing Containers

Jeongkyu Shin

Founder / Researcher / CEO

Joongi Kim

Co-Founder / CTO

NVIDIA GTC

You need to visit an external page to watch the video. Click on the image to proceed.

Overview

Multi-modal, multi-agent AI systems are becoming the next-generation norm, and NVIDIA NIMs envisions the path to achieve it using optimized container templates. We'll explain how our GPU-native container engine further accelerates NIMs to deliver such advanced multi-agent AI systems at low cost and high performance. It exploits a novel fractional GPU-sharing technology to accommodate multiple different models having diverse performance bottlenecks in a single GPU, automate resource allocation and model combinations with memory size estimation techniques, and auto-scale the NIM containers by incorporating inference runtime metrics. All these features are implemented on both air-gapped on-premises clusters and cloud-native setups. On top of it, we've also built a streamlined UI to import, fine-tune, and serve open models in just one click, effectively hiding all those technical details from the end user. Putting it all together, let's dive into the universal world of NIMs.

backend.ai

Video

Universal NIM Acceleration With GPU-Sharing Containers

Universal NIM Acceleration With GPU-Sharing Containers