Backend.AI Blog

Top Stories

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
By Kyujin Cho, Jinho Heo
Learn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.
27 April 2026
- KV cache
- Inference
Read more
Building Production RAG Systems: Lessons from Tariff Support
By Sergey Leksikov
Over the past year, we have built two production RAG systems addressing completely different tasks. One is HSense, a multi-agent system for Korean customs item classification, and the other is the Backend.AI RAG Assistant, which processes customer support queries based on seven document projects.
23 April 2026
- RAG
- LLM
Read more
Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin Cho
DGX Spark is a desktop AI supercomputer that packs 128GB of unified memory and 1 PFLOP-class Grace Blackwell (GB10) performance into a palm-sized box. However, its internal GPU belongs to the SM12x series, distinct from the data center-grade Blackwell (SM100). This creates a subtle architectural gap: the latest LLM stacks, heavily reliant on MLA·DSA-specific kernels like GLM-5, "Blackwell support" alone doesn't guarantee immediate compatibility. This creates a subtle architectural gap requiring separate code management for Hopper, data center Blackwell, and consumer Blackwell. The engineering team examines Spark, which is based on Blackwell but features a slightly different architecture.
19 February 2026
- Architecture
- Backend.AI:GO
Read more

27 April 2026

How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading

By Kyujin Cho, Jinho Heo

23 April 2026
Building Production RAG Systems: Lessons from Tariff Support
By Sergey Leksikov
Read more
19 February 2026
Inside NVIDIA DGX Spark: Is DGX Spark Actually Blackwell?
By Jeongkyu Shin, Kyujin Cho
Read more

News
See all News
- Lablup Joins the Python Software Foundation as a Participating Sponsor
  By Lablup
  Lablup is now a Participating Sponsor of the Python Software Foundation (PSF).
  13 February 2026
  Read more
- Behind the Success: Lablup x Upstage Pass Phase 1 Evaluation for Sovereign AI Foundation Model Project
  By Lablup
  In January 2026, the Upstage consortium that Lablup is part of successfully passed the Phase 1 evaluation for the Korean government's Sovereign AI Foundation Model project. This initiative aims to protect national AI sovereignty by having the government provide support for GPUs, data, and talent development, while the private sector actively leverages these resources to develop frontier-grade AI foundation models. We sat down with team members from Upstage and Lablup to hear the behind-the-scenes story of our Phase 1 journey.
  6 February 2026
  Read more
- Meet Lablup at CES 26
  By Lablup
  6 January 2026
  Read more
See all News
Releases
See all Releases
- Release: Backend.AI FastTrack 3 25.18
  By Lablup
  This article covers the major changes in Backend.AI FastTrack 3 25.18.
  5 January 2026
  Read more
- Release: Backend.AI 25.15 (LTS)
  By Lablup
  Backend.AI 25.15 LTS is now officially available. This release brings comprehensive system-level optimization and user experience improvements, reinforcing the platform’s reliability and scalability for large-scale AI model training, deployment, and research.
  2 October 2025
  Read more
- Release: Backend.AI 25.6 (LTS)
  By Lablup
  We're excited to announce Backend.AI 25.6, the first Long Term Support (LTS) release of 2025. This update brings significant improvements to system monitoring, audit logging, and model service auto-scaling, making operations more convenient than ever.
  17 April 2025
  Read more
See all Releases
Engineering
See all Engineering
- How to save GPU memory in LLM serving: Principles and operating conditions of KV cache offloading
  By Kyujin Cho, Jinho Heo
  Learn how KV cache offloading works in LLM serving for Agentic AI—covering architecture, data movement paths, and when offloading helps or hurts inference performance.
  27 April 2026
  - KV cache
  - Inference
  Read more
- Building Production RAG Systems: Lessons from Tariff Support
  By Sergey Leksikov
  Over the past year, we have built two production RAG systems addressing completely different tasks. One is HSense, a multi-agent system for Korean customs item classification, and the other is the Backend.AI RAG Assistant, which processes customer support queries based on seven document projects.
  23 April 2026
  - RAG
  - LLM
  Read more
- Writing Stories for 50 Components: Foundation, Automation, and AI
  By Seunghyun Lim
  To write Storybook stories for 50+ BAI components in the Backend.AI WebUI, I started by setting up the infrastructure— i18n, theming, and branding — then upgraded to Storybook v10 and merged two instances into one. An automation pipeline combining a 1,000-line guideline, Claude-based story generation, and GitHub Actions CI checks kept quality consistent from PR creation through deployment. The key takeaway: build the foundation and automation first, and when working with AI, the human role shifts from writing code to defining and refining standards.
  5 March 2026
  - Frontend
  - Guide
  Read more
See all Engineering

backend.ai

Blog

Top Stories

News

Releases

Engineering

We value your privacy