Tag : NVIDIA GTC
Personalized Generative AI: Operation and Fine-Tuning in Household Form Factors
By Jeongkyu ShinThe advent of personal computers has fundamentally transformed our lives over the past 40 years. Just to name a few, we witnessed the digitization of life by the internet and the smartphones. Now we're on the cusp of a new era, moving beyond the age of PCs to the age of personalized agents (PAs) or artificial intelligence. At the heart of this transformation is the rapid progress of large language models (LLMs) and multimodal AI. It's no longer about generating smart replies from somewhere in the cloud; with the advent of powerful consumer GPUs, it's now possible to run a personalized generative AI at home.
We'll introduce automated methods for running generative AI models on compact form factors like PCs or home servers, and for personalizing them via fine-tuning. We'll show how PA can be more closely integrated into our daily lives. Furthermore, we'll showcase the results obtained through these methods via a live demo, inviting you to contemplate the future of personalized AI with us.
1 March 2024
From Idea To Crowd: Manipulating Local LLMs At Scale
By Jeongkyu Shin, Joongi KimLarge language models (LLMs) are the pinnacle of generative AI. While cloud-based LLMs have enabled mass adoption, local on-premises LLMs are garnering attention in favor of personalization, security, and air-gapped setups. Ranging from personal hobbies to professional domains, both open-source foundational LLMs and fine-tuned models are utilized across diverse fields. We'll introduce the technology and use cases for fine-tuning and running LLMs on a small scale, like PC GPUs, to an expansive scale to serve mass users on data centers. We combine resource-saving and model compression techniques like quantization and QLoRA with vLLM and TensorRT-LLM.
Additionally, we illustrate the scaling-up process of such genAI models by the fine-tuning pipeline with concrete and empirical examples. You'll gain a deep understanding of how to achieve the operation and expansion of personalized LLMs, and inspirations for the possibilities that this opens up.
1 March 2024
MLOps Platforms, NVIDIA Tech Integrations to Scale for AI and ML Development
By Joongi KimLablup CTO Joongi Kim participated in the joint webinar below to introduce Backend.AI
MLOps is key to accelerating AI deployments. In this session, you’ll hear interviews with three ISVs and their customers on how, together, they've found success implementing MLOps solutions to accelerate their respective AI deployments. We'll focus on addressing some of the most common deployment challenges that enterprise customers face and how the MLOPs partner ecosystem can help in addressing those challenges.
You’ll hear from Run.AI and Wayve on how they trailblazed the scaling of AI/ML in autonomous vehicles. You’ll also hear how Weights & Biases works with John Deere/Blue River Technology to achieve advancement of AI in agriculture. Finally, you’ll hear how Backend.AI has supported LG Electronics in making smart factories more efficient.
The session will highlight specific use cases and best practices across the MLOPs life cycle. Come learn about NVIDIA MLOPs partners, and how they've deployed with enterprise customers. This session will feature real solution examples that you won't want to miss.
1 March 2022
Leveraging Heterogeneous GPU Nodes for AI
By Jeongkyu ShinIn this session, Lablup Inc. will present three solutions for achieving optimal performance when combining various GPUs as one AI/high performance computing cluster. Their solutions are based on Backend.AI, an open source container-based resource management platform specialized in AI and high performance computing. They'll include real-world examples that provide AI developers and researchers with an optimal scientific computing environment and massive cost savings.
1 October 2020
Accelerating Hyperparameter Tuning with Container-Level GPU Virtualization
By Jeongkyu Shin, Joongi KimIt's commonly believed that hyperparameter tuning requires a large number of GPUs to get quick, optimal results. It's generally true that higher computation power delivers more accurate results quickly, but to what extent? We'll present our work and empirical results on finding a sweet spot to balance both costs and accuracy by exploiting partitioned GPUs with Backend.AI's container-level GPU virtualization. Our benchmark includes distributed MNIST, CIFAR-10 transfer learning, and TGS salt identification cases using AutoML with network morphism and ENAS tuner with NNI running on Backend.AI's NGC-optimized containers. Attendees will get a tangible guide to deploy their GPU infrastructure capacity in a more cost-effective way.
1 October 2020