Toward Energy-Efficient LLM Inference Serving Systems
Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to 1) consume large amounts of ene...