From idle to ideal
Container-level GPU virtualization
Backend.AI's patented technology intercepts CUDA API calls inside containers to precisely control GPU resources at the software level. Share a single physical GPU across multiple users safely while maintaining complete workload isolation, with no hardware changes required.
16,000+
GPUs under management
400%
GPU utilization increase
75%
Infrastructure cost reduction
3
Registered patents (KR/US/JP)
How It Works
How to share GPUs without changing a single line of code
Maintain your existing AI workloads exactly as they are. Backend.AI's virtualization layer sits transparently between your applications and physical GPUs.
Backend.AI's GPU virtualization layer allows users to run existing applications without rewriting or recompiling.
GPU compute and memory are divided by container, down to precise-level of fractional GPU units for fine-grained allocation.
Each container's GPU workload is completely isolated, preventing interference between users.
Multi-user security threats are blocked with enterprise-grade sandboxing at the container level.
Reallocate GPU resources in real-time without restarting GPUs or interrupting running workloads.
Backend.AI Virtualization Architecture
User A
PyTorch
User B
TensorFlow
User C
vLLM
User D
Jupyter
Security sandboxing
Backend.AI GPU virtualization
Container-level GPU virtualization · Patented
Operating system & device drivers
NVIDIA GPU
AMD / Intel NPU
High-speed Network
GPU Sharing Modes
Flexible GPU resource allocation
GPU fractional sharing
Split a physical GPU in precise fractional units for concurrent multi-user sharing. Ideal for education, inference, and development workloads.
Container A
0.35
Container B
0.30
Container C
0.20
Container D
0.15
Mixed MIG Usage Supported
Multi-GPU · Multi-node
Run a single job across multiple GPUs and nodes with automatic overlay network and RDMA support.
Node 1
Node 2
Node 3
Container A
16 GPU · 2 Nodes
Container B
2 GPU
Container C
6 GPU
Technology Comparison
GPU virtualization technology comparison
| Feature | Backend.AI fGPU | NVIDIA MIG | NVIDIA MPS |
|---|---|---|---|
| Implementation Method | Pure Software | Hardware-based | Process Level |
| Partition Flexibility | Dynamic (0.01 unit) | Fixed Instances (Max 7) | Dynamic (memory-based) |
| Runtime Adjustment | Supported | Requires GPU Reset | Limited |
| Error Isolation | Full Isolation | HW-level Isolation | Weak Isolation |
| Multi-tenancy | Native Support | Limited | Not Supported |
| Heterogeneous GPU Support | NVIDIA, AMD beta, Intel beta, +α | NVIDIA A100/H100+ | NVIDIA Only |
| On-Premise / Air-gapped | Full Support | Supported (HW dep.) | Supported (HW dep.) |
| Performance Overhead | Similar to MPS | Near Zero (HW partition) | Low |
| Mixed MIG Usage | Supported | — | Not Supported |
Why GPU virtualization
Business impact
400%
Maximize GPU utilization
Transform average 20-30% GPU utilization into near-full capacity through software virtualization, eliminating idle resources.
110%
Pipeline performance boost
Accelerate end-to-end ML pipelines by overlapping data preprocessing, training, and inference on shared GPUs, eliminating idle wait between stages.
75%
Reduce infrastructure costs
Process more workloads with the same physical GPUs. Expand AI capabilities without additional GPU purchases.
2x
GPU lifecycle management
Repurpose training GPUs for inference workloads. Maximize legacy equipment utilization through flexible allocation.
Use Cases
Use cases
Various industries and workloads utilizing GPU virtualization.
University GPU cluster sharing
Hundreds of researchers and students share limited GPU resources equitably. Multi-tenancy and metering enable transparent resource allocation.
Financial air-gapped LLM operations
Develop and operate internal LLMs in air-gap environments. Guarantee complete data sovereignty while efficiently utilizing limited resources.
Cloud GPU-as-a-Service
CSP fine-grained GPU resource distribution to customers. World's first commercial inference service based on fractional GPU.
Large-scale distributed training
Foundation model training with 500+ GPUs. Achieved 73 days of uninterrupted operation and 47% faster failure recovery.
Bioscience simulation
Execute GPU-accelerated molecular simulations in partitioned GPU environments. Reduce research costs while conducting diverse experiments simultaneously.
Enterprise AI development
Isolate GPU resources by department and meter usage for fair distribution. Transition from manual GPU allocation to automated unified management.
See the GPU utilization improvement for yourself
Request a demo of Backend.AI's GPU virtualization on your GPU infrastructure.