Tag : NVIDIA GTC

  • Universal NIM Acceleration With GPU-Sharing Containers

    By Jeongkyu Shin, Joongi Kim

    Multi-modal, multi-agent AI systems are becoming the next-generation norm, and NVIDIA NIMs envisions the path to achieve it using optimized container templates. We'll explain how our GPU-native container engine further accelerates NIMs to deliver such advanced multi-agent AI systems at low cost and high performance. It exploits a novel fractional GPU-sharing technology to accommodate multiple different models having diverse performance bottlenecks in a single GPU, automate resource allocation and model combinations with memory size estimation techniques, and auto-scale the NIM containers by incorporating inference runtime metrics. All these features are implemented on both air-gapped on-premises clusters and cloud-native setups. On top of it, we've also built a streamlined UI to import, fine-tune, and serve open models in just one click, effectively hiding all those technical details from the end user. Putting it all together, let's dive into the universal world of NIMs.

    28 March 2025

  • Resilient Edge-Cloud Hybrid AI Infrastructure: Orchestrating Multi-Modal Agents in Resource-Constrained Environments

    By Jeongkyu Shin

    Explore new solutions for building resilient AI infrastructure that seamlessly integrates edge and cloud computing through intelligent orchestration. We'll demonstrate how an eight-node NVIDIA Jetson Nano cluster serves multiple LLMs (including Gemma 2 2B and Llama 3.2 3B) and supports multi-modal AI agents while maintaining cloud GPU integration. Learn how to orchestrate distributed AI systems that process text, images, and sensor data, enabling robust edge computing with cloud failover capabilities. Our implementation shows how to maintain continuous AI operations in air-gapped environments and during network disruptions. Through real-world scenarios, you'll discover how this hybrid architecture provides both local processing reliability and cloud scalability, offering practical solutions for modern AI deployment challenges. We'll share specific deployment strategies and performance optimizations for running sophisticated multi-agent workloads across edge and cloud environments.

    28 March 2025

  • Personalized Generative AI: Operation and Fine-Tuning in Household Form Factors

    By Jeongkyu Shin

    The advent of personal computers has fundamentally transformed our lives over the past 40 years. Just to name a few, we witnessed the digitization of life by the internet and the smartphones. Now we're on the cusp of a new era, moving beyond the age of PCs to the age of personalized agents (PAs) or artificial intelligence. At the heart of this transformation is the rapid progress of large language models (LLMs) and multimodal AI. It's no longer about generating smart replies from somewhere in the cloud; with the advent of powerful consumer GPUs, it's now possible to run a personalized generative AI at home.

    We'll introduce automated methods for running generative AI models on compact form factors like PCs or home servers, and for personalizing them via fine-tuning. We'll show how PA can be more closely integrated into our daily lives. Furthermore, we'll showcase the results obtained through these methods via a live demo, inviting you to contemplate the future of personalized AI with us.

    1 March 2024

  • From Idea To Crowd: Manipulating Local LLMs At Scale

    By Jeongkyu Shin, Joongi Kim

    Large language models (LLMs) are the pinnacle of generative AI. While cloud-based LLMs have enabled mass adoption, local on-premises LLMs are garnering attention in favor of personalization, security, and air-gapped setups. Ranging from personal hobbies to professional domains, both open-source foundational LLMs and fine-tuned models are utilized across diverse fields. We'll introduce the technology and use cases for fine-tuning and running LLMs on a small scale, like PC GPUs, to an expansive scale to serve mass users on data centers. We combine resource-saving and model compression techniques like quantization and QLoRA with vLLM and TensorRT-LLM.

     Additionally, we illustrate the scaling-up process of such genAI models by the fine-tuning pipeline with concrete and empirical examples. You'll gain a deep understanding of how to achieve the operation and expansion of personalized LLMs, and inspirations for the possibilities that this opens up.

    1 March 2024

  • MLOps Platforms, NVIDIA Tech Integrations to Scale for AI and ML Development

    By Joongi Kim

    Lablup CTO Joongi Kim participated in the joint webinar below to introduce Backend.AI

    MLOps is key to accelerating AI deployments. In this session, you’ll hear interviews with three ISVs and their customers on how, together, they've found success implementing MLOps solutions to accelerate their respective AI deployments. We'll focus on addressing some of the most common deployment challenges that enterprise customers face and how the MLOPs partner ecosystem can help in addressing those challenges.

    You’ll hear from Run.AI and Wayve on how they trailblazed the scaling of AI/ML in autonomous vehicles. You’ll also hear how Weights & Biases works with John Deere/Blue River Technology to achieve advancement of AI in agriculture. Finally, you’ll hear how Backend.AI has supported LG Electronics in making smart factories more efficient.

    The session will highlight specific use cases and best practices across the MLOPs life cycle. Come learn about NVIDIA MLOPs partners, and how they've deployed with enterprise customers. This session will feature real solution examples that you won't want to miss.

    1 March 2022

  • Leveraging Heterogeneous GPU Nodes for AI

    By Jeongkyu Shin

    In this session, Lablup Inc. will present three solutions for achieving optimal performance when combining various GPUs as one AI/high performance computing cluster. Their solutions are based on Backend.AI, an open source container-based resource management platform specialized in AI and high performance computing. They'll include real-world examples that provide AI developers and researchers with an optimal scientific computing environment and massive cost savings.

    1 October 2020

  • Accelerating Hyperparameter Tuning with Container-Level GPU Virtualization

    By Jeongkyu Shin, Joongi Kim

    It's commonly believed that hyperparameter tuning requires a large number of GPUs to get quick, optimal results. It's generally true that higher computation power delivers more accurate results quickly, but to what extent? We'll present our work and empirical results on finding a sweet spot to balance both costs and accuracy by exploiting partitioned GPUs with Backend.AI's container-level GPU virtualization. Our benchmark includes distributed MNIST, CIFAR-10 transfer learning, and TGS salt identification cases using AutoML with network morphism and ENAS tuner with NNI running on Backend.AI's NGC-optimized containers. Attendees will get a tangible guide to deploy their GPU infrastructure capacity in a more cost-effective way.

    1 October 2020

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.