Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Implementation

This page describes the technical implementation of the Cassville Checkers environment and the AI agents used for analysis.

Gymnasium Environment

The game is implemented as a Gymnasium environment, making it compatible with standard reinforcement learning libraries.

Observation Space

The environment provides a dictionary observation containing:

KeyShapeDescription
marble_positions[num_players * 5]Position index for each marble
current_player()Which player’s turn (0 to num_players-1)
die_roll()Current die roll (0-5, representing 1-6)
lap_completed[num_players * 5]Boolean flags for lap completion
mercy_counters[num_players]Failed deployment attempts per player
action_mask[30]Binary mask of legal actions

Action Space

The action space is discrete with 30 possible actions (6 action types × 5 marbles):

Action TypeDescription
HOME_TO_STAGINGMove marble from home to staging (requires roll of 1 or 6)
STAGING_TO_RINGMove marble from staging onto the ring
RING_MOVEAdvance marble around the ring by die value
RING_TO_GOALEnter goal area (requires exact landing after lap completion)
MERCY_MOVEUse mercy rule to bypass 1/6 deployment requirement
SKIPSkip turn when no valid moves available

Reward Structure

EventReward
Marble reaches goal+100
Win game+5000
Capture opponent+15
Get captured-10
Time penalty (per turn)-0.1
Lose game-1000

Agent Implementations

RandomAgent

The simplest baseline—selects uniformly at random from all legal actions. Provides a lower bound on expected performance.

Heuristic Agents

Priority-based agents that prefer certain action types over others. Three variants explore different strategic priorities:

HeuristicAgentAdvance (best heuristic):

Priority: goal > ring > staging > home > mercy > skip

Focuses on advancing marbles already on the ring before deploying new ones.

HeuristicAgentBalanced:

Priority: goal > staging > ring > home > mercy > skip

Moves marbles from staging to ring first, then advances them.

HeuristicAgentDeploy:

Priority: goal > mercy > home > staging > ring > skip

Aggressively deploys new marbles before advancing existing ones.

GreedyAgent

A score-based agent that assigns point values to each action type:

Action TypeBase ScoreNotes
RING_TO_GOAL100Highest priority—finish marbles
STAGING_TO_RING40Get staged marbles moving
RING_MOVE25+15 bonus for lap-completed marbles
MERCY_MOVE20Helps when stuck
HOME_TO_STAGING15Reduced—avoid early deployment
SKIP0Last resort

The greedy agent selects the legal action with the highest score.

PPO Agents

Neural network policies trained using MaskablePPO from sb3-contrib. Action masking ensures the policy only selects legal moves.

Training configurations tested:

VersionTraining RegimeWin/Loss Rewards
v2vs heuristic_balanced only500 / -100
v3vs mixed opponents (balanced, advance, greedy) + v2 PPO500 / -100
v4Same as v35000 / -1000 (10x)

All versions used:

Code Structure

cassville_checkers/
├── board.py          # Board topology, position encoding
├── game_state.py     # Game logic, move validation, captures
├── env.py            # Gymnasium environment wrapper
├── agents.py         # Agent implementations
└── train.py          # PPO training utilities

Key Classes

Board: Manages the 48-position ring topology and maps between player-relative and absolute positions.

GameState: Handles all game logic including move validation, capturing, bonus turns, and win detection.

CassvilleCheckersEnv: Gymnasium-compatible environment with action masking support.

Source Code

The full implementation is available at: github.com/cranmer/cassville-checkers