Javascript must be enabled for the correct page display

Comparitive Study Between GPU Utilization and Inflight Requests Based Autoscaling systems

Mitra, Siddharth (2021) Comparitive Study Between GPU Utilization and Inflight Requests Based Autoscaling systems. Master's Thesis / Essay, Computing Science.

[img]
Preview
Text
s4138430_master_thesis.pdf

Download (1MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (96kB)

Abstract

With the advances in Machine Learning, the deployment of Deep Learning models requiring GPUs at inference time is becoming increasingly common. GPUs are expensive resources that are often present in limited numbers as project resources. In a Kubernetes environment, where the inference services run in a serverless platform, autoscaling GPUs during inference time is a challenge. Companies often need to make informed decisions on the autoscaling approach to use while designing and implementing an inference serving system in such platforms. In this thesis, we design and implement a simple autoscaling system that scales the GPUs based on the average GPU memory utilization. We compare this system with another system that scales the number of GPUs based on the number of inflight requests by studying their behavior in response to different environmental conditions that incrementally simulate real-world characteristics. These simulations model systems that are pounded by inference requests at a constant rate and another in which the systems are loaded with variable traffic. Through experiments, we show that the request-based autoscaling approach is better suited for use cases where the focus is on providing lower inference latency rather than better GPU utilization. In contrast, the GPU utilization-based autoscaling approach provides a more conservative way to utilize GPUs, generally leaving GPUs available for other use but at the cost of providing slow inference response times.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Lazovik, A. and Medema, M.
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 23 Sep 2021 08:11
Last Modified: 23 Sep 2021 08:11
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/26121

Actions (login required)

View Item View Item