Mar 28, 2025
Universal NIM Acceleration With GPU-Sharing Containers

Jeongkyu Shin
Founder / Researcher / CEO

Joongi Kim
Co-Founder / CTO
Mar 28, 2025
Universal NIM Acceleration With GPU-Sharing Containers

Jeongkyu Shin
Founder / Researcher / CEO

Joongi Kim
Co-Founder / CTO
Overview
Multi-modal, multi-agent AI systems are becoming the next-generation norm, and NVIDIA NIMs envisions the path to achieve it using optimized container templates. We'll explain how our GPU-native container engine further accelerates NIMs to deliver such advanced multi-agent AI systems at low cost and high performance. It exploits a novel fractional GPU-sharing technology to accommodate multiple different models having diverse performance bottlenecks in a single GPU, automate resource allocation and model combinations with memory size estimation techniques, and auto-scale the NIM containers by incorporating inference runtime metrics. All these features are implemented on both air-gapped on-premises clusters and cloud-native setups. On top of it, we've also built a streamlined UI to import, fine-tune, and serve open models in just one click, effectively hiding all those technical details from the end user. Putting it all together, let's dive into the universal world of NIMs.