Mar 28, 2025

Universal NIM Acceleration With GPU-Sharing Containers

    Jeongkyu Shin

    Founder / Researcher / CEO

    Joongi Kim

    Co-Founder / CTO

Mar 28, 2025

Universal NIM Acceleration With GPU-Sharing Containers

    Jeongkyu Shin

    Founder / Researcher / CEO

    Joongi Kim

    Co-Founder / CTO

You need to visit an external page to watch the video. Click on the image to proceed.

Overview

Multi-modal, multi-agent AI systems are becoming the next-generation norm, and NVIDIA NIMs envisions the path to achieve it using optimized container templates. We'll explain how our GPU-native container engine further accelerates NIMs to deliver such advanced multi-agent AI systems at low cost and high performance. It exploits a novel fractional GPU-sharing technology to accommodate multiple different models having diverse performance bottlenecks in a single GPU, automate resource allocation and model combinations with memory size estimation techniques, and auto-scale the NIM containers by incorporating inference runtime metrics. All these features are implemented on both air-gapped on-premises clusters and cloud-native setups. On top of it, we've also built a streamlined UI to import, fine-tune, and serve open models in just one click, effectively hiding all those technical details from the end user. Putting it all together, let's dive into the universal world of NIMs.

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.