Implementation - Cassville Checkers

This page describes the technical implementation of the Cassville Checkers environment and the AI agents used for analysis.

Gymnasium Environment¶

The game is implemented as a Gymnasium environment, making it compatible with standard reinforcement learning libraries.

Observation Space¶

The environment provides a dictionary observation containing:

Key	Shape	Description
`marble_positions`	`[num_players * 5]`	Position index for each marble
`current_player`	`()`	Which player’s turn (0 to num_players-1)
`die_roll`	`()`	Current die roll (0-5, representing 1-6)
`lap_completed`	`[num_players * 5]`	Boolean flags for lap completion
`mercy_counters`	`[num_players]`	Failed deployment attempts per player
`action_mask`	`[30]`	Binary mask of legal actions

Action Space¶

The action space is discrete with 30 possible actions (6 action types × 5 marbles):

Action Type	Description
`HOME_TO_STAGING`	Move marble from home to staging (requires roll of 1 or 6)
`STAGING_TO_RING`	Move marble from staging onto the ring
`RING_MOVE`	Advance marble around the ring by die value
`RING_TO_GOAL`	Enter goal area (requires exact landing after lap completion)
`MERCY_MOVE`	Use mercy rule to bypass 1/6 deployment requirement
`SKIP`	Skip turn when no valid moves available

Reward Structure¶

Event	Reward
Marble reaches goal	+100
Win game	+5000
Capture opponent	+15
Get captured	-10
Time penalty (per turn)	-0.1
Lose game	-1000

Agent Implementations¶

RandomAgent¶

The simplest baseline—selects uniformly at random from all legal actions. Provides a lower bound on expected performance.

Heuristic Agents¶

Priority-based agents that prefer certain action types over others. Three variants explore different strategic priorities:

HeuristicAgentAdvance (best heuristic):

Priority: goal > ring > staging > home > mercy > skip

Focuses on advancing marbles already on the ring before deploying new ones.

HeuristicAgentBalanced:

Priority: goal > staging > ring > home > mercy > skip

Moves marbles from staging to ring first, then advances them.

HeuristicAgentDeploy:

Priority: goal > mercy > home > staging > ring > skip

Aggressively deploys new marbles before advancing existing ones.

GreedyAgent¶

A score-based agent that assigns point values to each action type:

Action Type	Base Score	Notes
`RING_TO_GOAL`	100	Highest priority—finish marbles
`STAGING_TO_RING`	40	Get staged marbles moving
`RING_MOVE`	25	+15 bonus for lap-completed marbles
`MERCY_MOVE`	20	Helps when stuck
`HOME_TO_STAGING`	15	Reduced—avoid early deployment
`SKIP`	0	Last resort

The greedy agent selects the legal action with the highest score.

PPO Agents¶

Neural network policies trained using MaskablePPO from sb3-contrib. Action masking ensures the policy only selects legal moves.

Training configurations tested:

Version	Training Regime	Win/Loss Rewards
v2	vs `heuristic_balanced` only	500 / -100
v3	vs mixed opponents (balanced, advance, greedy) + v2 PPO	500 / -100
v4	Same as v3	5000 / -1000 (10x)

All versions used:

1 million timesteps
MLP policy with 2×64 hidden layers
CPU training (~1000 FPS)

Code Structure¶

cassville_checkers/
├── board.py          # Board topology, position encoding
├── game_state.py     # Game logic, move validation, captures
├── env.py            # Gymnasium environment wrapper
├── agents.py         # Agent implementations
└── train.py          # PPO training utilities

Key Classes¶

Board: Manages the 48-position ring topology and maps between player-relative and absolute positions.

GameState: Handles all game logic including move validation, capturing, bonus turns, and win detection.

CassvilleCheckersEnv: Gymnasium-compatible environment with action masking support.

Source Code¶

The full implementation is available at: github.com/cranmer/cassville-checkers