MiniChess-RL: Deep Q-Network Agent in a Custom 4×4 Chess Environment

MiniChess-RL is a fully custom reinforcement learning environment where a Deep Q-Network agent is trained end-to-end in a 4×4 chess setting. The project focuses on environment modeling, reward shaping, illegal action masking, and structured evaluation with reproducible experiments.

Research Question

How do reward design choices and environment modeling decisions affect learning stability and measurable performance in a compact chess-like MDP with a large discrete action space?

System Design

Experimental Results

Training Episodes
2000
Win Rate vs Random
19%
Average Reward (100 eval episodes)
-0.588

Results demonstrate partial learning behavior and highlight the sensitivity of reward shaping and exploration scheduling in compact adversarial environments.

MiniChess RL Reward Curve

Reproducibility

Train:

python3 training/train.py

Evaluate:

python3 training/evaluate.py

Experiments were conducted using PyTorch with a fixed evaluation protocol and greedy policy testing.

Future Directions

Reinforcement Learning Deep Q-Network Environment Design Illegal Action Masking Evaluation PyTorch