Mitra, Siddharth (2021) Comparitive Study Between GPU Utilization and Inflight Requests Based Autoscaling systems. Master's Thesis / Essay, Computing Science.
|
Text
s4138430_master_thesis.pdf Download (1MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (96kB) |
Abstract
With the advances in Machine Learning, the deployment of Deep Learning models requiring GPUs at inference time is becoming increasingly common. GPUs are expensive resources that are often present in limited numbers as project resources. In a Kubernetes environment, where the inference services run in a serverless platform, autoscaling GPUs during inference time is a challenge. Companies often need to make informed decisions on the autoscaling approach to use while designing and implementing an inference serving system in such platforms. In this thesis, we design and implement a simple autoscaling system that scales the GPUs based on the average GPU memory utilization. We compare this system with another system that scales the number of GPUs based on the number of inflight requests by studying their behavior in response to different environmental conditions that incrementally simulate real-world characteristics. These simulations model systems that are pounded by inference requests at a constant rate and another in which the systems are loaded with variable traffic. Through experiments, we show that the request-based autoscaling approach is better suited for use cases where the focus is on providing lower inference latency rather than better GPU utilization. In contrast, the GPU utilization-based autoscaling approach provides a more conservative way to utilize GPUs, generally leaving GPUs available for other use but at the cost of providing slow inference response times.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Lazovik, A. and Medema, M. |
Degree programme: | Computing Science |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 23 Sep 2021 08:11 |
Last Modified: | 23 Sep 2021 08:11 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/26121 |
Actions (login required)
View Item |