Mar 1, 2024
From Idea To Crowd: Manipulating Local LLMs At Scale
신정규
창업멤버 / 연구원 / CEO
김준기
창업멤버 / CTO
Mar 1, 2024
From Idea To Crowd: Manipulating Local LLMs At Scale
신정규
창업멤버 / 연구원 / CEO
김준기
창업멤버 / CTO
Overview
Large language models (LLMs) are the pinnacle of generative AI. While cloud-based LLMs have enabled mass adoption, local on-premises LLMs are garnering attention in favor of personalization, security, and air-gapped setups. Ranging from personal hobbies to professional domains, both open-source foundational LLMs and fine-tuned models are utilized across diverse fields. We'll introduce the technology and use cases for fine-tuning and running LLMs on a small scale, like PC GPUs, to an expansive scale to serve mass users on data centers. We combine resource-saving and model compression techniques like quantization and QLoRA with vLLM and TensorRT-LLM. Additionally, we illustrate the scaling-up process of such genAI models by the fine-tuning pipeline with concrete and empirical examples. You'll gain a deep understanding of how to achieve the operation and expansion of personalized LLMs, and inspirations for the possibilities that this opens up.