Backend.AI: Enterprise-scale Cluster Backend for AI Frameworks

Mar 1, 2024

From Idea To Crowd: Manipulating Local LLMs At Scale

신정규

창업멤버 / 연구원 / CEO

김준기

창업멤버 / CTO

NVIDIA GTC

Mar 1, 2024

From Idea To Crowd: Manipulating Local LLMs At Scale

신정규

창업멤버 / 연구원 / CEO

김준기

창업멤버 / CTO

NVIDIA GTC

비디오 시청을 위해서는 외부 페이지를 방문해야 합니다. 화면 이미지를 클릭해서 이동하세요.

Overview

Large language models (LLMs) are the pinnacle of generative AI. While cloud-based LLMs have enabled mass adoption, local on-premises LLMs are garnering attention in favor of personalization, security, and air-gapped setups. Ranging from personal hobbies to professional domains, both open-source foundational LLMs and fine-tuned models are utilized across diverse fields. We'll introduce the technology and use cases for fine-tuning and running LLMs on a small scale, like PC GPUs, to an expansive scale to serve mass users on data centers. We combine resource-saving and model compression techniques like quantization and QLoRA with vLLM and TensorRT-LLM. Additionally, we illustrate the scaling-up process of such genAI models by the fine-tuning pipeline with concrete and empirical examples. You'll gain a deep understanding of how to achieve the operation and expansion of personalized LLMs, and inspirations for the possibilities that this opens up.

backend.ai

영상

From Idea To Crowd: Manipulating Local LLMs At Scale

From Idea To Crowd: Manipulating Local LLMs At Scale