Data Center Energy Portfolio Optimization
This project deals with the problem of online inventory management with demand constraints. In particular, given a time-varying pricing and demand function for electricity and the ability to store a limited amount of electricity for future use (inventory), the task is to minimize the cost of buying electricity while meeting the demand at every time instant.
The existing state-of-the-art techniques  propose a behavior rule that is analytically shown to have the lowest possible worst case cost ratio. The authors have shown that the method also works well in the average case against several competitive baselines. However, this rule based method does not learn or adapt to the distributions of price and demand. This project attempted to develop methods using Reinforcement Learning to see whether there is room for improvement by learning from interactions with this environment.
After setting up the RL problem with the price and demand variations as the environment and the battery control as the agent, I applied a range of standard RL algorithms as well as imitation learning (DAGGER ). The oracle for imitation learning was the optimal offline solution.
Outcomes: The RL methods are difficult to train for this problem and even with hyperparameter tuning, the best performing agents cannot beat the SotA baseline. The imitation learning agent fares better than the RL agents ans is competitive with the SotA (but not better).
I ended the project by outlining avenues for future work including dynamically changing action spaces and two-stage control. Although I am no longer working on this problem, this project is being actively worked on by Prof. Hajiesmaili.