Backend.AI: Enterprise-scale Cluster Backend for AI Frameworks

Culture

Check out the story of Lablup

ON THIS PAGE

Jan 27, 2025

Culture

2024 Winter Intern in Lablup

Yonggeun Kwon
Research Intern

Jan 27, 2025

Culture

2024 Winter Intern in Lablup

Yonggeun Kwon
Research Intern

Overview

In May 2024, I came across a Facebook post from Lablup announcing their internship recruitment. I applied through the Typeform link provided. I still vividly remember the struggle of writing in English for the Typeform (even though it stated that either Korean or English was acceptable, I felt I have to write in English). Since Lablup develops Backend.AI, an open-source software platform, I thought technical development skills would be essential. However, as my studies had been focused on AI research, I confidently applied to the research team.

The interview was conducted in English with three interviewers: Sergey and Eunjin from the research team, and Kyujin from the DevOps team. I did my best to explain the projects in my portfolio in English, and before I knew it, 40 minutes had flown by. Although I felt I couldn’t fully convey my prepared points due to my limited English conversation skills, I unexpectedly received an acceptance email three days later.

That’s how I began my internship in July with three other interns. Lablup was my first experience in a corporate, so I tried to dress as neatly as possible for my first day. I wore a long-sleeved shirt and slacks despite the summer heat. When I arrived at the office just before 10 a.m., I was surprised to find Jong-eun, dressed casually in shorts and sneakers, as the only person there. He explained, "There are no assigned seats here, so feel free to sit wherever you’d like," and mentioned that others were working remotely. During the 10 a.m. all-hands meeting, I got to meet the team through Jong-eun’s screen. The atmosphere was quite different from the corporate image I had in mind, which initially caught me off guard. However, through self-introductions and coffee chats, I gradually adapted to Lablup's unique environment.

Work

Developing a Domain-Adaptive Language Model

After onboarding (installing Backend.AI), I joined the research team to begin my first task: developing a language model specialized for the trade domain.

The general workflow is depicted in the diagram above. The goal was to train a language model to extract keywords and summaries from trade-related email conversations. These outputs would then help customers generate quotations. Detailed information about the model development and evaluation process can be found on the Lablup blog.

To develop this language model, the process involved several steps: [Dataset Collection and Preprocessing] → [Model Training] → [Evaluation]. Throughout this process, I encountered many questions and faced initial challenges working in the Backend.AI environment. Since the internship was only two months long, every day felt precious. If I got stuck on something, it could cost me an entire day, so I quickly learned the importance of asking for help without hesitation.

One of the best thing of Lablup was its asynchronous communication style. Whenever I had a question, I could post it on the relevant channel. While it felt intimidating at first, knowing everyone could see my question, it was far better than wasting time being stuck. The team members were always kind in their responses, which helped me save time and work on other tasks while waiting for answers. This open-source culture of collaboration was truly a strength of Lablup.

Evaluating the Agent Helmsman with LLM

As mentioned earlier, Lablup develops Backend.AI, a software platform with numerous CLI-based commands. For new users, memorizing all the commands can be challenging. While there is a WebUI, even this can feel unfamiliar for first-time users.

Helmsman is an LLM-based agent designed to address this issue. Users can provide natural language instructions (e.g., "Create a session" or "I want to train the Llama 7b model") through a chat interface. Helmsman then either provides the appropriate CLI command or executes it directly. This allows users to achieve their goals through simple chat instructions, making the platform much more accessible.

However, Helmsman didn’t always provide accurate responses. Since it relied on Backend.AI’s CLI documentation and commands, outdated or incorrect documentation could lead to errors. Additionally, as an LLM-based agent, it was susceptible to hallucinations. To evaluate Helmsman’s performance and identify issues in the documentation, I developed the following LLM-based evaluation system after discussions with Sergey and Eunjin.

1. LLM1: Instruction Generator

The first step is performed by the Instruction Generator LLM. This LLM analyzes Backend.AI's documentation to generate user instructions that are likely to be used in Backend.AI.

For example, it might generate an instruction like, "Create a session with 4 CPU cores and 16GB of memory."

2. LLM2: CLI Command Generator

The second step is performed by the CLI Command Generator LLM. This LLM generates CLI commands based on the instruction created by the first LLM, along with the CLI documentation and a few-shot examples.

For instance, it might generate the following command:

'[Backend.AI](http://backend.ai/) session create
--name cli-test-session --resources cpu=4
--resources mem=16g --resources cuda.shares=2 ,…’

3. Executing and Logging CLI Commands

The generated CLI command is executed, and the results are logged as follows:

Success/Failure Status: Checks whether the command executed as intended.
Error Logs: Records error messages if the command fails.
Reference Documentation: Logs the documentation used to generate the command.

Through this process, users can verify the results of command execution and, if issues arise, obtain additional information to troubleshoot the problem.

Result Analysis

Errors can be categorized into three types:

Instruction Errors: Caused by insufficient user input, which can often be resolved through user-agent interaction.
Documentation Errors: Rare but critical, as they result in repeated failures. These require corrections in the documentation.
Hallucination Errors: Mitigated by adding few-shot examples. However, prompt length becomes a concern, so matching few-shots to specific commands can help.

Due to time constraints, the tasks following the result analysis were left as future work.

Reflection

Started as a two-month internship was extended to six months. Looking back, I realize that two months might have been too short to fully experience and contribute to Lablup, given the depth of Backend.AI and the remote-friendly culture.

However, as mentioned earlier, Lablup encourages a culture of asking questions. By taking advantage of this, even two months can be enough for meaningful growth.

Beyond work, I also had the chance to participate in events like PyCon and the Lablup Conference. Notably, I even had the opportunity to be a speaker at the Lablup Conference. Though I felt there was much to improve in my presentation, it was an invaluable experience.

Lablup is a company filled with unique individuals. Everyone’s character stood out so much that I often thought of myself as the most ordinary person there. Yet, I found that their distinctiveness also came with extraordinary talents. I laughed a lot, learned a lot, and received a lot of help. This made work enjoyable, and I never dreaded going to the office.

My time at Lablup was not only a period of professional growth but also a time of personal enrichment through meaningful interactions and experiences. I hope to carry forward the values I learned here and continue to grow.

Blog

Culture

2024 Winter Intern in Lablup

2024 Winter Intern in Lablup

Overview

Work

Developing a Domain-Adaptive Language Model

Evaluating the Agent Helmsman with LLM

1. LLM1: Instruction Generator

2. LLM2: CLI Command Generator

3. Executing and Logging CLI Commands

Result Analysis

Reflection