Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Strategy Analysis Conclusions

This document summarizes the key findings from the benchmark analysis of Cassville Checkers strategies.

Performance Rankings

Based on 2-player head-to-head matchups across 20 games each:

RankStrategyWin RateKey Strength
1greedy~69%Score-based optimization balances all priorities
2heuristic_advance~64%Advances marbles before deploying new ones
3heuristic_balanced~61%Good balance of staging and advancing
4random~38%Baseline performance
5heuristic_deploy~19%Aggressive deployment causes congestion

Key Insights

1. Advance Before Deploy

The most critical insight is that advancing existing marbles before deploying new ones is essential for winning. The data clearly shows:

2. Captures Are Costly

Strategies that minimize captures finish games faster and win more:

Strategy2P Captures4P Captures
greedy~1.8~14
heuristic_balanced~1.9~13
heuristic_advance~1.5~12
random~9~93
heuristic_deploy~12~139

The heuristic_deploy strategy creates massive ring congestion, leading to 10x more captures than efficient strategies.

3. Greedy Optimization Works

The greedy agent’s score-based selection is highly effective because it:

4. Player Count Scaling

Games scale predictably with player count:

PPO Agent Analysis

We trained multiple versions of PPO agents using MaskablePPO from sb3-contrib:

2-Player Performance (200 games each)

Modelvs balancedvs advancevs greedyAvg
v250.5%47.0%48.0%48.5%
v351.0%50.0%51.5%50.8%
v452.5%52.5%50.0%51.7%

All models perform at ~50% win rate, meaning they are evenly matched with the heuristic opponents.

4-Player Performance (200 games each, fair baseline: 25%)

Modelvs balancedvs advancevs greedyAvg
v233.5%28.0%27.5%29.7%
v323.0%19.5%27.0%23.2%
v420.5%21.0%18.5%20.0%

The 4-player agents perform at ~20-30%, which is near the fair baseline of 25% for 4 equally-matched players.

Key PPO Findings

  1. PPO agents are competitive with heuristics - they learned to play at a similar skill level

  2. Training variations made no significant difference - v2, v3, and v4 all perform similarly within statistical uncertainty (±7% for 200 games)

  3. Diverse opponents didn’t help - training against mixed strategies (v3) didn’t improve generalization

  4. Increased rewards didn’t help - 10x reward scaling (v4) didn’t improve learning

  5. PPO matches but doesn’t exceed heuristics - the neural network learned competent play but not superior strategies

Why heuristic_deploy Fails

The aggressive deployment strategy fails because:

  1. Ring congestion: Multiple marbles deployed before any advance

  2. Capture cascades: More marbles on ring = more capture opportunities

  3. Wasted progress: Captured marbles lose all ring progress

  4. Longer games: 5-10x more captures per game than efficient strategies

The data consistently shows this is the worst strategy across all configurations.

For optimal human or AI play:

  1. Prioritize entering the goal when a marble can complete its journey

  2. Advance marbles on the ring before deploying new ones from home

  3. Move marbles from staging to ring promptly to avoid blocking

  4. Use home-to-staging only when the ring is relatively clear

  5. Consider score-based evaluation for complex decisions

The greedy and heuristic_advance strategies best exemplify these principles.

Future Work

Given that PPO achieved parity with heuristics but not superiority, potential improvements to explore: