Releases
Release and updates
- Uncharted Waters
- Uncharted AI
- Make AI Scalable
- Open Source
- BackendAI CLI Installer Easy installation experience with TUI
- bndev Easily build your own AI infrastructure
- BackendAI Core
- Key Updates
- Nextgen Sokovan
- BackendAI WebUI
- Lablup Enterprise
- Scaling made easy FastTrack 2 Finetuning Cluster Designer
- FastTrack 2
- Finetuning
- Cluster Designer
- BackendAI Helmsman
- Acceleration made easy
- Inference made easy
- PALI PALI PALI PALIsup2sup PALANG
- G
- From Uncharted AI to Industrial Revolution
- Engine of AI Infrastructure
This article is a summary of Jeongkyu Shin's keynote speech on September 24, 2024 at lab | up > /conf/4.
On September 24, 2024, Lablup's 4th conference, lab | up > /conf/4, was held. The event was attended by a variety of external speakers as well as Lablup employees, and the keynote address was given by Lablup's CEO, Jeongkyu Shin.
Photo by 'iT dongA'
This article will cover the advancements in the AI era as introduced by Jeongkyu Shin in his keynote speech, the future trajectory of Lablup, updates on the current products, and some of our new product releases.
Uncharted Waters
The title of this keynote, "Uncharted AI - The Age of AI," draws inspiration from the classic game "Uncharted Waters," fondly remembered by many. However, the Uncharted Waters is not merely a game; it represents a significant chapter in the real-life history of our global community.
During the Age of Discovery, beginning in the 15th century, numerous explorers ventured across the oceans in pursuit of spices, such as the nowaday widely-known "pepper." Although I was not alive during that time to witness it firsthand, so I played it with a game. We may not consider a spice today so valuable, but numerous adventurers risked their lives in its pursuit.
Uncharted AI
Like so many people who risked their lives across the ocean in search of spices back then, we're in a new era of artificial intelligence (AI), and we're risking our lives and working with a diverse set of partners to advance AI. The necessity of this effort lies in its commitment to accessibility. If I could harvest pepper in my backyard, I wouldn't have to cross the ocean. At the dawn of a new era, this difference in access creates a skills gap for some and a challenge for others. For Lablup, the skills gap introduced by emerging technologies has catalyzed the dawn of a new era.
At Lablup, our motto has been clear since our founding in 2015. We've made it our core mission to Make AI Accessible, making technology more accessible and lowering barriers. Our goal was to reduce the barriers to AI accessibility by making the technology itself comprehensible and user-friendly, not merely available as an API.
As the field of AI advances, the challenge of scaling emerges. As AI technology expands, data it processes increases, computation also intensifies, it moves from single-node to multi-node, and from tens to hundreds of thousands of GPUs. Simultaneously, AI is becoming more compact, operating on devices in the palm of your hand, such as Samsung's Galaxy AI and Apple Intelligence, as well as on IoT sensors like thermometers.
Simultaneously, we are witnessing efforts to operate AI with greater power and more resources, as well as a surge in endeavors to run AI with less power and fewer resources. If we consider the traditional spectrum of AI, it is expanding both upwards (larger) and downwards (smaller), with the technology needed to shift the scale in either direction being entirely distinct.
Back in 2015, we were able to construct models using just a GeForce GTX970. However, workloads have expanded so rapidly that for the past four or five years, their growth has surpassed the performance improvements of semiconductors, known as Moore's Law. Consequently, the focus has shifted from enhancing a single chip's performance to combining several chips and utilizing them in parallel.
Make AI "Scalable"
Over the past four years, the distributed computing paradigm in AI has undergone significant evolution. We have moved beyond parallel processing to witness a variety of computations occurring concurrently. Diverse tasks like data processing, model training, and service provisioning are now integrated. Simultaneous demands for heterogeneous computational resources have emerged, encompassing databases, training, data processing, fleet management, RAS, and others that align more closely with the service stack.
Accelerators such as GPUs have become essential for modern computing. We no longer use CPUs and GPUs separately; instead, we must integrate them more closely. The driving force behind this integration is the universal need for GPUs, which leads to bottlenecks that are both physical—such as power, network, and data—and non-physical, including hardware instability, platform management, and software issues. At Lablup, our goal is to eliminate these obstacles to scaling.
This year at Lablup, we've set a new objective: Make AI Scalable. Our aim is to expand AI workloads across the full range, from accelerators to individual nodes to hyperscale environments. This goal builds upon our initial mission of “Making AI Accessible,” as we eliminate obstacles to scaling, incorporate elements that facilitate scaling, and persist in dismantling barriers to accessing AI technology.
Through the years, the company's dedication to making AI both accessible and scalable has resulted in numerous innovations. As a result, the number of enterprise GPU running on Backend.AI has grown to nearly 13,000, with some sites managing more than 1,500 GPUs. Additionally, the number of teams (customers) utilizing our products has increased to over 100. In varied sectors such as cloud services, AI accelerator testbeds, and autonomous driving, Backend.AI has established itself as a crucial infrastructure component for AI.
This massive scale significantly increased the technical challenge. We've had to develop technologies that span the entire spectrum, from single servers to thousands of clusters. We had to “take away everything that are blocking the scaling, and add everything for the scaling.” We would like to use this opportunity to share our recent innovations, the ongoing developments, and the future we are striving to create.
Open Source
Lablup is a company that is deeply involved in the open source ecosystem. We are developing and releasing various projects such as Backend.AI, Callosum, aiodocker, aiomonitor (aiotools), Raftify, and many more. Open source is in our DNA. Our experience on the open-source we create, publish, or contribute to across various on-premises environments is a significant competitive edge of us. Backend.AI's support for on-premises environments, compatibility with cloud environments, and more are all capabilities that what we've gained from our open source experience.
Backend.AI CLI Installer: Easy installation experience with TUI
The Backend.AI CLI Installer is an open-source initiative designed to enhance the accessibility of Backend.AI. It features a text-based user interface (TUI) for simplified installation, automates the package-based installation process, and includes meta settings for streamlined automatic setup.
bndev: Easily build your own AI infrastructure
For enthusiasts who enjoy tinkering and hacking beyond mere package-based installations, we have introduced a development tool named bndev. This tool simplifies the process of constructing and maintaining intricate Backend.AI development environments. The concept behind bndev is to empower everyone to own and maintain their personal AI infrastructure.
Backend.AI Core
Backend.AI conducts major version releases biannually, in March and September. The release of version 24.03 took place in March 2024, and the upcoming release of version 24.09 is imminent. Significant updates to Backend.AI Core are expected to influence future releases. Allow me to introduce these changes for you.
Key Updates
- Support for NVIDIA NGC(NVIDIA GPU Cloud) NIM(Nemo Infrerence Microservice): Key NGC features, like license-based container image loading, are compatible with Backend.AI.
- Expanded support for new accelerators including Intel Gaudi2, Rebellions ATOM+, and Furiosa RNGD: Backend.AI allows you to flexibly choose the best AI accelerator to match the characteristics of your workload.
- General availability of Backend.AI model store, browser, and serving: A comprehensive solution that integrates the essential features of MLOps, simplifying the process for customers to find AI models and deploy them seamlessly into their workflows.
- Enhanced Task Scheduling: The new Priority Scheduler enables the independent prioritization of tasks, ensuring that tasks of high importance are addressed swiftly and dependably.
- Agent Selector Concept: The Agent Selector is responsible for determining which nodes the scheduler actually runs the selected tasks on. This part is now easily customizable as a standalone plugin. You can use it to distribute jobs based on different criteria, such as power usage or temperature of each node. We expect this to be a great help in optimizing the operation of your infrastructure by balancing the load across nodes, increasing power efficiency, and more.
- Our own Docker network plugin: Expanded support for GPUDirect Storage for large-scale data processing, minimizing bottlenecks in moving data within a single node.
- Cilium-based networking stack for inter-container communication: The implementation has enhanced large-scale distributed learning, resulting in a 30% increase in network performance compared to previous setup.
- OpenID Connect (OIDC)-based federated authentication scheme: Access various infrastructure services, such as Backend.AI and others, using a single account to significantly streamline account management.
- Expanded support for enterprise environments: It works with a variety of PrivateContainer Registries, including GitLab, GitHub Enterprise, AWS ECR, and more, and makes it easy to configure hybrid configurations that span both on-premises legacy resources and the cloud.
Leveraging these updates, Backend.AI is broadening its scope as a cutting-edge AI infrastructure, serving both high-performance computing (HPC) and enterprise needs. Further enhancements will accompany the launch of Backend.AI 24.09.
Next-gen Sokovan
We continues to develop the next-generation Sokovan, scheduled for release early the following year. Here is a brief overview of what to expect from Next-gen Sokovan.
- Dual-engine architecture supporting Kubernetes: In addition to the current proprietary cluster management system, it will function as a native Kubernetes service. This includes managing accelerators through the Kubernetes Operator Proxy. We will seamlessly integrate NVIDIA and AMD device plugins, Intel GPU plugins, among others, to uphold industry standards.
- Database load balancing with Raftify during high-availability (HA) config: Minimize bottlenecks for metadata services and ensure reliable operation in clusters of tens of thousands of units.
- Enhanced automatic scaling for serving large language models: API metrics like request patterns and latency, and resource usage are analyzed for optimal scaling
- Strengthening the project unit: Capable to manage datasets, models, pipelines, and more collectively. The objective is to facilitate fine-grained role-based access control (RBAC) to accommodate diverse collaborative scenarios.
- Enhanced management capabilities for enterprise customers: You'll have integrated logging and monitoring, as well as audit log tracking for regulatory compliance.
All of these changes are being made with one goal in mind: to accelerate our customers' AI projects. With the new AI accelerator and connections to other Kubernetes-based solutions, our team is looking forward to further maturing the Backend.AI Core and MLOps features. Stay tuned for the next Sokovan's journey as he takes on a broader role.
Backend.AI WebUI
In the near future, the Backend.AI WebUI will be getting a new look. From a user's perspective, the user interface is probably the most important factor that determines the first impression of Backend.AI. We have always recognized the importance of the WebUI and have been innovating on it. We launched ML Desktop last year and GenAI Desktop earlier this year to test different user experiences, and we recently brought a user-friendly UI to our products with Neo Session Launcher.
Introducing WebUI Neo, the third new evolution of WebUI. Designed in close collaboration with Vice Versa Design Studio with the goal of delivering a rich user experience, this new design language is designed with the user in mind from start to finish. To coincide with the relaunch of Backend.AI, we've redesigned the entire UI/UX to give it a sleeker, more futuristic look and feel.
WebUI Neo was designed with the concepts of “reducing cognitive load” and “maintaining consistency in visual metaphors.” In terms of reducing cognitive load, we wanted to minimize the amount of complex information users had to type or top-search. For example, when setting up large-scale experiments, we limited the amount of information available in a step by exposing information sequentially, rather than presenting dozens of options at once.
In terms of “maintaining consistency in visual metaphors,” we've organized UI/UX elements, from screen composition to icons to colors, into similar design patterns for similar concepts, such as experiments, models, and data sets. By this, our users can reuse what they've learned once without having to relearn how to use similar features. WebUI Neo will be applied across both Backend.AI Core and Enterprise.
In recognition of this innovation, WebUI Neo was awarded the Excellence Award, which is only given to four consortia, at the Seoul Design Foundation and Seoul Metropolitan Government's Industrial Design Development Support Project for Small and Medium-sized Enterprises.
WebUI Neo will not be included in the Backend.AI 24.09 update right away, but is still being developed and tested with the goal of a general release later this year. We're also finalizing the move from Web Components, which is the codebase used since the first version of WebUI, to React. WebUI Neo is more than just a repackaging of past features; it will continue to add new functionality that is tightly aligned with machine learning workflows and will be the foundation for achieving the high level of automation and ease of use that Backend.AI strives for. This is the future we envision with WebUI Neo, a world where everyone can easily understand and benefit from AI infrastructure beyond its complexity.
Lablup Enterprise
The core of Lablup Enterprise, centered on Backend.AI Enterprise, can be described as ___ made easy. Lablup Enterprise aims to make deep-level AI technology innovation easy with end-to-end technology from device driver level to AIOps. We have three ___ made easy concepts: “Scaling made easy”, “Acceleration made easy”, and “Inference made easy”.
Scaling made easy: FastTrack 2, Finetun.ing, Cluster Designer
FastTrack 2
FastTrack 2, released with 24.09, is an automation solution for AI projects at scale. It provides pipeline management based on project groups, making it easy to define and execute complex workflows. It offers a wide range of reusable templates to minimize repetitive tasks. In addition, FastTrack 2 enables you to better leverage your resources by connecting with external partners. You can add model compression nodes and model serving services from partners to your pipeline.
Finetun.ing
Finetun.ing is a cloud-based fine-tuning service created in collaboration with FastTrack. It stands out from traditional fine-tuning services by eliminating the need for users to prepare their own data. Typically, fine-tuning involves uploading data to adjust model, but Finetun.ing simplifies this process by allowing users to interactively input prompts. The service then generates synthetic data from these interactions to fine-tune the model. The finetuned models are automatically evaluated and made available for download, complete with a model card. Finetun.ing operates on NVIDIA NemoTron and supports Llama 3.1 and Gemma 2. Ongoing tests aim to enable fine-tuning for an array of new models, with plans to expand the selection in the future.
Finetun.ing is currently gearing up for its final unveiling, and we've decided to take a waitlist for the first time at this event. You can sign up for the waitlist at https://finetun.ing.
Cluster Designer
Backend.AI Cluster Designer is a GUI-based cluster design tool. It automatically calculates the effective performance of a cluster of your desired size and performance, along with the required hardware configuration and estimated cost. It's perfect for those who want to validate the optimal architecture before actually building.
Backend.AI Helmsman
Backend.AI Helmsman is an interactive cluster management interface. It makes complex cluster operations possible just by chatting in a terminal. Under the hood, it utilizes a Gemma-based fine-tuning model to accurately understand user intent. It combines packages such as TorchTune, LangGraph, and LangChain to build interactive fine-tuning pipelines for on-premises environments. UI packages and models via the Helmsman CLI and WebUI will be released after the Backend.AI 24.09 release, by the end of the year.
Acceleration made easy
The second is “Acceleration made easy”. We support a wide variety of accelerators for AI workloads than any other AI infrastructure platform in existence.
CPU architecture coverage includes x86 as well as heterogeneous architectures such as Arm and RISC-V. We work closely with the latest accelerators, including NVIDIA's Grace Hopper, AMD's MI Series, Intel Gaudi, GraphCore BOW, GroqCard, Rebelion ATOM+, and Furiosa RNGD, to ensure you get the same user experience and peak performance on Backend.AI.
Inference made easy
Finally, “Inference made easy”.
We've simplified the sharing and distribution of pre-trained models with a unified model store. Inspired by package managers like Choco on Windows and Homebrew on macOS, Lablup ION model recipes allow you to install models and services contributed by the community via GitHub with a single line of command.
PALI, PALI PALI (PALI2), PALANG
There's also something new to introduce in terms of model service operations. It's PALI, PALI2, PALANG.
**Performant AI Launcher for Inference (PALI) is a high-performance inference runtime that combines the Backend.AI model player with a curated model catalog and predefined models. It features flexible scalability and high performance. Anyone can easily install, run NVIDIA NIM, Hugging Face models, and Lablup ION recipes right out of the box to run model services.
PALI2 is a dedicated hardware infrastructure appliance for PALI. You can easily scale by connecting multiple appliances with PALI. PALI2 is an architecture optimized for AI workloads, delivering high performance and low latency. Depending on your installation, we can provide and update models for different architectures and chip environments.
We are also preparing a PALI2 appliance that incorporates the NVIDIA reference platform GH200, and KYOCERA Mirai Envision Co., Ltd. in Japan will launch Instant.AI as the first reference platform for PALI2, which will be available for purchase on October 1.
Reference platforms for the Korean market will be available to reserve in October and for sale in Q4. PALI2 appliances targeting the U.S. and European markets will be available as early as Q4 of this year.
PALANG is a language model inference platform that includes PALI, FastTrack, Talkativot, and Helmsman. It provides ready-to-use inference and fine-tuning settings, greatly simplifying the deployment and operation of large-scale language models. Talkativot makes it easy to create custom chatbot interfaces and provides software components for model comparison and interface building during development. You can use PALI and PALI2 if you only need references, or PALANG if you need both language model fine-tuning and inference.
G
Finally, One More Thing... We'd like to give you a sneak peek at a new project we're currently working on: G, a language model based on Gemma2. It features easy customization with Finetun.ing. It will be used for a variety of purposes, including a backend model for Helmsman and an enterprise agent. Details will be revealed soon.
From Uncharted AI to Industrial Revolution
During the Age of Discovery, countless adventurers sailed the globe in search of pepper. Their adventures led to the discovery of many parts of the world that remained uncharted, and the world became more connected through the routes they opened. Shipbuilding and navigation were improved, new trade routes were opened, and innovations were made in medicine, military technology, and more. But that's not all: the Age of Discovery spawned another important event: the Industrial Revolution.
We are currently living in what is known as the Age of Great AI. It's akin to the dawn of the Age of Discovery, where the doors to new possibilities are just now opening. One person is returning with pepper, while another is constructing a larger vessel to demonstrate that the Earth is round. We are witnessing the equivalent of what the Industrial Revolution brought by the Age of Discovery.
Engine of AI Infrastructure
The Industrial Revolution began with James Watt's steam engine. The invention of the steam engine ushered in an era of mass production and mechanization. Now we're in the midst of another revolution. In the face of the tidal wave that is the Age of Great AI, Lablup is building a new engine.
Lablup is the engine of AI infrastructure. Our technology fuels innovation across industries. While the steam engine harnessed the power of coal, our engine is fueled by data. Just as a car engine converts the energy of gasoline into motion, Lablup provides an efficient and powerful engine that converts the fuel of data into AI, and the value it brings.
Just as the internal combustion engine gave birth to the automotive industry, AI engines will reshape the data-driven IT industry. Lablup is preparing for the time when everyone and every organization will be able to derive insights and value from their own data, rather than just storing and managing it. Lablup's AI engine is unrivaled in scale and speed. It has the scale to run dozens to tens of thousands of GPUs simultaneously, processing petabytes of data in real time, for the IoT and beyond. Just as the performance of an engine determines the speed of a car, our infrastructure will determine your success in the AI ecosystem.
So far, you've seen the engines that we had built. With these engines, we want to drive the AI revolution beyond the Age of Great AI. We're going to work on designing and improving the engine so that each and every one of you can be in the driver's seat. We invite you to step on the gas pedal of the AI era with Lablup.