From idle to ideal

Container-level GPU virtualization

Backend.AI's patented technology intercepts CUDA API calls inside containers to precisely control GPU resources at the software level. Share a single physical GPU across multiple users safely while maintaining complete workload isolation, with no hardware changes required.

16,000+

GPUs under management

400%

GPU utilization increase

75%

Infrastructure cost reduction

3

Registered patents (KR/US/JP)

How It Works

How to share GPUs without changing a single line of code

Maintain your existing AI workloads exactly as they are. Backend.AI's virtualization layer sits transparently between your applications and physical GPUs.

01Transparent software virtualization

Backend.AI's GPU virtualization layer allows users to run existing applications without rewriting or recompiling.

02Precise resource partitioning

GPU compute and memory are divided by container, down to precise-level of fractional GPU units for fine-grained allocation.

03Workload isolation

Each container's GPU workload is completely isolated, preventing interference between users.

04Security sandboxing

Multi-user security threats are blocked with enterprise-grade sandboxing at the container level.

05Dynamic resource adjustment

Reallocate GPU resources in real-time without restarting GPUs or interrupting running workloads.

Backend.AI Virtualization Architecture

Application Layer

User A

PyTorch

User B

TensorFlow

User C

vLLM

User D

Jupyter

Security Layer

Security sandboxing

Virtualization Layer

Backend.AI GPU virtualization

Container-level GPU virtualization · Patented

System Layer

Operating system & device drivers

Hardware

NVIDIA GPU

AMD / Intel NPU

High-speed Network

GPU Sharing Modes

Flexible GPU resource allocation

GPU fractional sharing

Split a physical GPU in precise fractional units for concurrent multi-user sharing. Ideal for education, inference, and development workloads.

Physical GPUNVIDIA A100 80 GB
User A28 GB
User B24 GB
User C16 GB
User D12 GB

Container A

0.35

Container B

0.30

Container C

0.20

Container D

0.15

Mixed MIG Usage Supported

fGPUSoftware
+
NVIDIA MIGHardware

Multi-GPU · Multi-node

Run a single job across multiple GPUs and nodes with automatic overlay network and RDMA support.

Node 1

GPU 1
GPU 2
GPU 3
···
GPU 8

Node 2

GPU 1
GPU 2
GPU 3
···
GPU 8

Node 3

GPU 1
GPU 2
GPU 3
GPU 4
GPU 5
GPU 6
GPU 7
GPU 8

Container A

16 GPU · 2 Nodes

Container B

2 GPU

Container C

6 GPU

Technology Comparison

GPU virtualization technology comparison

FeatureBackend.AI fGPUNVIDIA MIGNVIDIA MPS
Implementation MethodPure SoftwareHardware-basedProcess Level
Partition FlexibilityDynamic (0.01 unit)Fixed Instances (Max 7)Dynamic (memory-based)
Runtime AdjustmentSupportedRequires GPU ResetLimited
Error IsolationFull IsolationHW-level IsolationWeak Isolation
Multi-tenancyNative SupportLimitedNot Supported
Heterogeneous GPU SupportNVIDIA, AMD beta, Intel beta, +αNVIDIA A100/H100+NVIDIA Only
On-Premise / Air-gappedFull SupportSupported (HW dep.)Supported (HW dep.)
Performance OverheadSimilar to MPSNear Zero (HW partition)Low
Mixed MIG UsageSupportedNot Supported

Why GPU virtualization

Business impact

400%

Maximize GPU utilization

Transform average 20-30% GPU utilization into near-full capacity through software virtualization, eliminating idle resources.

110%

Pipeline performance boost

Accelerate end-to-end ML pipelines by overlapping data preprocessing, training, and inference on shared GPUs, eliminating idle wait between stages.

75%

Reduce infrastructure costs

Process more workloads with the same physical GPUs. Expand AI capabilities without additional GPU purchases.

2x

GPU lifecycle management

Repurpose training GPUs for inference workloads. Maximize legacy equipment utilization through flexible allocation.

Use Cases

Use cases

Various industries and workloads utilizing GPU virtualization.

Education

University GPU cluster sharing

Hundreds of researchers and students share limited GPU resources equitably. Multi-tenancy and metering enable transparent resource allocation.

Finance

Financial air-gapped LLM operations

Develop and operate internal LLMs in air-gap environments. Guarantee complete data sovereignty while efficiently utilizing limited resources.

Cloud

Cloud GPU-as-a-Service

CSP fine-grained GPU resource distribution to customers. World's first commercial inference service based on fractional GPU.

Training

Large-scale distributed training

Foundation model training with 500+ GPUs. Achieved 73 days of uninterrupted operation and 47% faster failure recovery.

Research

Bioscience simulation

Execute GPU-accelerated molecular simulations in partitioned GPU environments. Reduce research costs while conducting diverse experiments simultaneously.

Enterprise

Enterprise AI development

Isolate GPU resources by department and meter usage for fair distribution. Transition from manual GPU allocation to automated unified management.

See the GPU utilization improvement for yourself

Request a demo of Backend.AI's GPU virtualization on your GPU infrastructure.

View Documentation

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

KR Office: 8F, 577, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea US Office: 3003 N First st, Suite 221, San Jose, CA 95134

© Lablup Inc. All rights reserved.

We value your privacy

We use cookies to enhance your browsing experience, analyze site traffic, and understand where our visitors are coming from. By clicking "Accept All", you consent to our use of cookies. Learn more