This repository contains code developed for CS394R: Reinforcement Learning: Theory and Practice project offered in Fall 2019 at University of Texas at Austin.
This project uses Reinforcement Learning based agents on the CityLearn envrionment. This code was developed by Vaibhav Sinha and Piyush Jain.
This code has been written in Python 3 and requires numpy, gym, matplotlib, pytorch and pandas.
The exceutable file is main.py
. Here are the main command line parameters for the code:
usage: main.py [-h]
--building_uids BUILDING_UIDS [BUILDING_UIDS ...]
--agent {RBC,DDP,TD3,Q,DDPG,SarsaLambda,N_Sarsa,QPlanningTiles,Degenerate,Random}
[--action_levels ACTION_LEVELS]
[--min_action_val MIN_ACTION_VAL]
[--max_action_val MAX_ACTION_VAL]
[--charge_levels CHARGE_LEVELS]
[--min_charge_val MIN_CHARGE_VAL]
[--max_charge_val MAX_CHARGE_VAL]
[--start_time START_TIME]
[--end_time END_TIME]
[--episodes EPISODES]
[--n N]
[--target_cooling TARGET_COOLING]
[--use_adaptive_learning_rate USE_ADAPTIVE_LEARNING_RATE]
[--use_parameterized_actions USE_PARAMETERIZED_ACTIONS]
As an example to run Q-Learning
with reduced action space -0.5 to 0.5
on building 8
with 5
levels of discretization for charge and action for 80
episodes run:
python main.py --agent Q --building_uids 8 --max_action_val 0.5 --min_action_val=-0.5 --action_levels 5 --charge_levels 5 --episodes 80
To run Q-Learning or any other algorithms on multiple buildings simply add the building uids sequentially
python main.py --agent Q --building_uids 8 21 67 --max_action_val 0.5 --min_action_val=-0.5 --action_levels 5 --charge_levels 5 --episodes 80
To run Sarsa, use N Step Sarsa and set n=1
. For general N step Sarsa use N appropriately. N is ignored if the agent is not N_Sarsa
python main.py --agent N_Sarsa --building_uids 8 --max_action_val 0.5 --min_action_val=-0.5 --action_levels 5 --charge_levels 5 --episodes 80 --n 1
Similarly for Sarsa Lamda pass lamda
parameter. lamda
is ignored if the agent is not SarsaLambda.
python main.py --agent SarsaLambda --building_uids 8 --max_action_val 0.5 --min_action_val=-0.5 --action_levels 5 --charge_levels 5 --episodes 80 --lamda 0.9
To find the performance of a Random Q-Learning/Sarsa agent use
python main.py --agent Random --building_uids 8 --max_action_val 0.5 --min_action_val=-0.5 --action_levels 5 --charge_levels 5
Notice that this does not use episodes. It always runs for one.
To get the RBC Baseline values, use:
python main.py --agent RBC --building_uids 8
RBC does not take any additional parameters and runs always for one episode.
To get the Degenerate Baseline values, use:
python main.py --agent Degenerate --building_uids 8
Degenerate too does not take any additional parameters and runs for one episode.
To run TD3
/ DDPG
use the following command.:
python main.py --agent TD3 --building_uids 8 --episodes 10
Change the agent to DDPG
if using that. These agent take no other arguments.
To run DP Baseline (theoretical best)
python main.py --agent DDP --building_uids 8 --action_level 9 --start_time 3500 --end_time 3600 --target_cooling 1
To run QLearningTiles with adaptive tile coding and action parameterization
python main.py --agent QPlanningTiles --building_uids 8 --action_level 9 --start_time 3500 --end_time 3600 --target_cooling 1 --use_adaptive_learning_rate True
The files citylearn.py
, energy_models.py
and reward_function.py
contains code for the environment.
ddp.py
has code for obatining theoretical best cost (DDP baseline).
policy_grad_agent.py
has code for RBC and Degenerate agents as well as TD3 and DDPG agents (code adapted from the official implementation).
sarsa.py
has code for Sarsa Lambda agent.
main.py
amd utils.py
handle the interfacing.
value_approx_agent.py
implements Q-Learning, N Step Sarsa and Random agent.
q_planning_tiles.py
has code for Q-Planner.
The data used can be found in data
directory.
This code is provided using the MIT License.