From Idea To Crowd: Manipulating Local LLMs At Scale

Mar 1, 2024

From Idea To Crowd: Manipulating Local LLMs At Scale

Jeongkyu Shin

Founder / Researcher / CEO

Joongi Kim

Co-Founder / CTO

NVIDIA GTC

Mar 1, 2024

From Idea To Crowd: Manipulating Local LLMs At Scale

Jeongkyu Shin

Founder / Researcher / CEO

Joongi Kim

Co-Founder / CTO

NVIDIA GTC

You need to visit an external page to watch the video. Click on the image to proceed.

Overview

Large language models (LLMs) are the pinnacle of generative AI. While cloud-based LLMs have enabled mass adoption, local on-premises LLMs are garnering attention in favor of personalization, security, and air-gapped setups. Ranging from personal hobbies to professional domains, both open-source foundational LLMs and fine-tuned models are utilized across diverse fields. We'll introduce the technology and use cases for fine-tuning and running LLMs on a small scale, like PC GPUs, to an expansive scale to serve mass users on data centers. We combine resource-saving and model compression techniques like quantization and QLoRA with vLLM and TensorRT-LLM.

Additionally, we illustrate the scaling-up process of such genAI models by the fine-tuning pipeline with concrete and empirical examples. You'll gain a deep understanding of how to achieve the operation and expansion of personalized LLMs, and inspirations for the possibilities that this opens up.

backend.ai

Video

From Idea To Crowd: Manipulating Local LLMs At Scale

From Idea To Crowd: Manipulating Local LLMs At Scale