MiniChess-RL is a fully custom reinforcement learning environment where a Deep Q-Network agent is trained end-to-end in a 4×4 chess setting. The project focuses on environment modeling, reward shaping, illegal action masking, and structured evaluation with reproducible experiments.
How do reward design choices and environment modeling decisions affect learning stability and measurable performance in a compact chess-like MDP with a large discrete action space?
Results demonstrate partial learning behavior and highlight the sensitivity of reward shaping and exploration scheduling in compact adversarial environments.
Train:
python3 training/train.py
Evaluate:
python3 training/evaluate.py
Experiments were conducted using PyTorch with a fixed evaluation protocol and greedy policy testing.