- OpenSource1.0 ~ 24.03
- EnterpriseR1 ~ R2
Sokovan: Python-based container orchestrator
Sokovan addresses the challenges of running resource-intensive batch workloads in a containerized environment. It offers acceleration-aware, multi- tenant, batch-oriented job scheduling and fully integrates multiple hardware acceleration technologies into various system layers to unleash the potential performance.
Sokovan has two levels of scheduling: a cluster-level node assignment scheduler and a node-level resource/device assignment scheduler. The cluster-level scheduler allows users to customize job placement strategies and control the density and priority of workloads, while the node-level scheduler optimizes per-container performance by automatically detecting and enabling hardware accelerators to each container.
This helps to improve the performance of AI workloads compared to Slurm and other existing tools. Sokovan has also been deployed on a large scale in various industries for a range of GPU workloads, including AI training and services. Its design and capabilities can help container-based MLOps platforms better utilize the latest hardware technologies.
Generic Kubernetes Pod-based GPU resource allocation
- Maps GPU and other computing resources in the Pod level only
- Creates Pods in prior and assigns Jobs to the Pods
- Some jobs may be pending due to inflexibility of sparing resources from existing Pods
Dynamic GPU allocation with Sokovan / Backend.AI
- Maps GPU and other computing resources in the Pod level only
- Creates Pods in prior and assigns Jobs to the Pods
- Some jobs may be pending due to inflexibility of sparing resources from existing Pods