<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Benny&apos;s Mind Hack</title>
    <description>Programming a computer to draw surely teaches us the most important lesson that creative spirit is in the details.
</description>
    <link>https://bennycheung.github.io/</link>
    <atom:link href="https://bennycheung.github.io/feed.xml" rel="self" type="application/rss+xml" />
    <pubDate>Sat, 18 Apr 2026 03:11:51 +0000</pubDate>
    <lastBuildDate>Sat, 18 Apr 2026 03:11:51 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
    
      <item>
        <title>Card Grammar - Teaching Machines the Rules of Complex Card Games</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;We built a pipeline that generates mechanically coherent cards, scales them in five-card batches, exports directly to Tabletop Simulator, and stress-tests balance using tournament algorithms. It sounds like the future of card game design. But when we took 13 of the most influential card games ever published and tried to fit their mechanics into the pipeline’s five-field schema, the results were humbling. Dominion mapped perfectly. Sushi Go worked trivially. Then Wingspan shattered the box, Terraforming Mars overwhelmed it, and KeyForge broke it entirely. This is the story of where automated card design hits its limits, what those limits reveal about the nature of game complexity, and how the solution required not better algorithms but a fundamentally different way of thinking about what a card actually is.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/post_cover.jpg&quot; alt=&quot;Card Grammar - Teaching Machines the Rules of Complex Card Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is Part 3 of the &lt;a href=&quot;three-waves-of-card-game-design-tools&quot;&gt;Card Architecture series&lt;/a&gt;. In &lt;a href=&quot;three-waves-of-card-game-design-tools&quot;&gt;Part 1&lt;/a&gt;, I traced the evolution of card game tools from scripting to design platforms. In &lt;a href=&quot;how-ai-actually-designs-a-card&quot;&gt;Part 2&lt;/a&gt;, I went inside the pipeline itself and examined which parts of card design are mechanistic and which parts are not. This article asks the harder question: what happens when the pipeline meets real games?&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-stress-test&quot;&gt;The Stress Test&lt;/h2&gt;

&lt;p&gt;The previous articles in this series described a powerful card generation pipeline: a system that reads a game’s ontology, generates cards with real mechanical depth, scales them through a batch loop, and exports playable prototypes. It is genuinely impressive technology.&lt;/p&gt;

&lt;p&gt;But impressive technology deserves honest testing. To understand the real limits of this approach, we took 13 of the most influential card games ever published, spanning seven distinct archetypes, and aggressively tried to map their cards into the basic five-field schema that the pipeline uses.&lt;/p&gt;

&lt;p&gt;That schema, to refresh, is a rigid card template with five fields: card name, card type, effect text, cost, and strategic role. Every generated card must fit inside this template. If you have ever prototyped with index cards, you know the feeling: five lines on the card, and you write “Village / Action / 3 coins / Draw 1 card, +2 Actions.” Clean, legible, complete.&lt;/p&gt;

&lt;p&gt;The question is: what happens when a game’s cards need more than five lines?&lt;/p&gt;

&lt;p&gt;The results sorted themselves into four distinct coverage tiers, from perfect fit to total structural mismatch.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Schema_Coverage_Tiers.jpg&quot; alt=&quot;The Edge of the Current Engine&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The coverage cliff from Tier A to Tier D, where the market opportunity lives.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;tier-a-full-five-lines-is-enough&quot;&gt;Tier A (Full): Five Lines Is Enough&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Tier&lt;/th&gt;
      &lt;th&gt;Schema Fit&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
      &lt;th&gt;Games&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Full (Tier A)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Near-perfect&lt;/td&gt;
      &lt;td&gt;Cards map perfectly. Every mechanical detail survives compression. Balance testing reflects the actual game.&lt;/td&gt;
      &lt;td&gt;Dominion, Star Realms, Sushi Go!&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/CardGame_Tier_A_Games.jpg&quot; alt=&quot;Tier A Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Deck builders and simple drafting games are the schema’s sweet spot. A Dominion card has a name (Village), a type (Action), a cost (3 coins), and an effect (“Draw 1 card, +2 Actions”). Five lines on the index card, nothing left out. Star Realms, Sushi Go, Ascension: all near-perfect fits.&lt;/p&gt;

&lt;p&gt;But these games represent the shallow end of the complexity pool.&lt;/p&gt;

&lt;h2 id=&quot;tier-b-partial-squinting-at-the-rules&quot;&gt;Tier B (Partial): Squinting at the Rules&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Tier&lt;/th&gt;
      &lt;th&gt;Schema Fit&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
      &lt;th&gt;Games&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Partial (Tier B)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Directionally correct&lt;/td&gt;
      &lt;td&gt;Core mechanics work but secondary systems are lost. Balance testing misses cross-system interactions.&lt;/td&gt;
      &lt;td&gt;7 Wonders, Blood Rage, Res Arcana, Everdell&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/CardGame_Tier_B_Games.jpg&quot; alt=&quot;Tier B Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Games like 7 Wonders and Blood Rage introduce mechanics the schema cannot cleanly express: era-based card phasing, prerequisite chains across ages, conditional scoring triggers tied to specific board positions. You can cram this information into the effect text string, but the simulator ends up squinting to understand the rules. The schema does not crash. It degrades gracefully, going blind to the parts of the game it cannot see.&lt;/p&gt;

&lt;h2 id=&quot;tier-c-insufficient-the-template-overflows&quot;&gt;Tier C (Insufficient): The Template Overflows&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Tier&lt;/th&gt;
      &lt;th&gt;Schema Fit&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
      &lt;th&gt;Games&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Insufficient (Tier C)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;~60-70% data loss&lt;/td&gt;
      &lt;td&gt;The schema captures a card’s name and a flattened cost. The economic engine, the tag system, and the trigger timing all evaporate.&lt;/td&gt;
      &lt;td&gt;Wingspan, Terraforming Mars, Race for the Galaxy&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/CardGame_Tier_C_Games.jpg&quot; alt=&quot;Tier C Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Engine builders are where the schema genuinely breaks. Five lines on an index card is nowhere near enough.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Schema_Compression_Crisis.jpg&quot; alt=&quot;The Schema Compression Crisis&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Try writing a Wingspan bird card on that index card. You need food cost (1 invertebrate + 1 seed, or 2 wild), habitat restriction (wetland only), egg capacity (2), power trigger timing (when activated, not when played), power text, nest type, wingspan measurement, and bonus traits for end-of-round scoring. That is at least eight structured fields. You start writing smaller, cramming text into margins, abbreviating until the card is unreadable. The simulator faces the same problem: a single bird card carries at least eight structured data fields that cannot be collapsed into the effect text string without losing approximately 60% of the card’s actual mechanical data.&lt;/p&gt;

&lt;p&gt;Terraforming Mars is worse. Its 208 project cards layer four problems on top of each other: a tag system where cards trigger effects on other cards across every player’s tableau, three card colors with fundamentally different lifecycle behaviors (fire once, fire repeatedly, or fire and self-destruct), game-state preconditions that gate card play (“Requires 5% oxygen”), and a dual-track economy where each of six resources has both a permanent production rate and a spendable stockpile. The basic schema misses more than half the game.&lt;/p&gt;

&lt;p&gt;In heavy engine builders, cards are social. They talk to each other. The basic schema treats every card as isolated on an island.&lt;/p&gt;

&lt;h2 id=&quot;tier-d-breaks-down-structural-mismatch&quot;&gt;Tier D (Breaks Down): Structural Mismatch&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Tier&lt;/th&gt;
      &lt;th&gt;Schema Fit&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
      &lt;th&gt;Games&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Breaks Down (Tier D)&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;~75-85% data loss&lt;/td&gt;
      &lt;td&gt;The schema is structurally incompatible with the game’s card model. Not a matter of missing fields, but a fundamental architectural mismatch.&lt;/td&gt;
      &lt;td&gt;KeyForge, Spirit Island, Gloomhaven&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/CardGame_Tier_D_Games.jpg&quot; alt=&quot;Tier D Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;At the bottom tier, the schema is not just missing fields. It is structurally incompatible with the game’s card model.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Tier_D_Structural_Mismatch.jpg&quot; alt=&quot;Tier D Structural Mismatch&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Spirit Island breaks on a different axis entirely: cross-card accumulation. Each power card carries element symbols (Fire, Air, Water) that accumulate across all cards played in a turn, unlocking threshold-gated innate abilities on the Spirit board. You do not play a card just for its printed effect. You play it partly for its element icons, which may unlock a completely different, more powerful ability elsewhere. The schema has no concept of this per-turn element economy that resets every round.&lt;/p&gt;

&lt;p&gt;KeyForge and Gloomhaven break the schema on yet another axis: time. A single KeyForge creature card packs four distinct abilities that fire at four different moments (on play, on reap, on fight, on destruction). If the simulator reads the card text as a single script and fires everything simultaneously, it fundamentally breaks the physical reality of the game. It is executing a four-act play as a single scene. Gloomhaven pushes this further: every action card has two independent halves, and choosing the top half means the bottom half &lt;em&gt;ceases to exist&lt;/em&gt; for that turn. Standard natural language processing fails completely when text is actually a multi-layered conditional timing puzzle.&lt;/p&gt;

&lt;p&gt;The problem is not that we need a smarter text reader. The problem is that reading text was never the right approach.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-solution-card-grammar&quot;&gt;The Solution: Card Grammar&lt;/h2&gt;

&lt;p&gt;So how does the technology evolve to handle these breaks?&lt;/p&gt;

&lt;p&gt;The obvious approach, adding 50 new fields to the schema and hoping for the best, would cause the language model to collapse under prompt weight, hallucinating garbage. The clever approach, building a smarter text reader, fails because of the invisible time dimension we just described. And hardcoding every game from scratch is financially impossible to scale.&lt;/p&gt;

&lt;p&gt;The solution required a paradigm shift in how the system thinks about cards.&lt;/p&gt;

&lt;h3 id=&quot;the-card-is-the-game&quot;&gt;The Card IS the Game&lt;/h3&gt;

&lt;p&gt;The old failing architecture treated a card as a generic object bouncing around inside a game’s rules. The game is a box; the card is a piece inside it. But for Wingspan, for Terraforming Mars, for any serious engine builder, the card &lt;em&gt;is&lt;/em&gt; the game. The card schema does not sit inside the ontology. It practically &lt;em&gt;is&lt;/em&gt; the ontology.&lt;/p&gt;

&lt;p&gt;This insight flips the entire architecture. Instead of trying to fit complex cards into a generic template, each game declares its own &lt;strong&gt;card grammar&lt;/strong&gt;: a structured definition of what fields exist on cards in this particular game, when those fields trigger, and how cards are allowed to communicate with each other.&lt;/p&gt;

&lt;h3 id=&quot;three-layers-of-card-intelligence&quot;&gt;Three Layers of Card Intelligence&lt;/h3&gt;

&lt;p&gt;A card grammar has three layers, each solving a specific failure mode from the stress test:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Card_Grammar_Three_Layers.jpg&quot; alt=&quot;The Card Grammar Architecture&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A three-layered optional extension. Basic games skip it entirely with zero regression.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anatomy&lt;/strong&gt; defines the strict physical fields that are allowed to exist in a given game. A Terraforming Mars card grammar declares fields for tags, card color, requirements, and production effects. A Wingspan grammar declares fields for food cost, habitat, egg capacity, and power trigger type. The fields are mathematically typed. A tag is exclusively a tag. A cost is exclusively a cost. The system never has to guess what a piece of data means from context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lifecycle&lt;/strong&gt; defines the rigid timing windows for when things are permitted to trigger. This is the direct answer to the KeyForge and Gloomhaven timing problem. Instead of dumping all effects into a single text block, the grammar declares distinct phases: effects that fire on play, effects that fire on activation, effects that fire between turns, effects that fire at game end. The simulator checks the trigger type and only fires matching abilities at the appropriate game phase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synergies&lt;/strong&gt; define the strict rules for how different cards are allowed to communicate with each other. This is what makes Terraforming Mars’s tag system work: when a Science tag is played, the engine checks all cards in the tableau for matching triggers. The grammar declares the interaction rules up front, so the simulator can monitor cross-card effects without guessing.&lt;/p&gt;

&lt;h3 id=&quot;sheet-music-not-a-live-concert&quot;&gt;Sheet Music, Not a Live Concert&lt;/h3&gt;

&lt;p&gt;The philosopher Nelson Goodman, writing in 1968, formalized a distinction that turns out to be directly useful here. Goodman described the difference between a &lt;strong&gt;score&lt;/strong&gt; and a &lt;strong&gt;performance&lt;/strong&gt;, between sheet music and a live concert. The sheet music is the strict, unambiguous notation. The concert is the rich, contextual, lived execution.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Goodman_Paradigm_Shift.jpg&quot; alt=&quot;The Goodman Connection&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Score (left) versus performance (right). The grammar is the strict notation. The generated card is the lived execution.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The card grammar definition (the anatomy, the lifecycle, the synergy rules) is the score. Any card generated conforming to that grammar is a compliant performance. Different games have completely different sheet music. The score for Wingspan looks nothing like the score for Terraforming Mars. But the underlying system, the musician, can read all of it. As long as you provide logically sound sheet music, the engine can perform any game.&lt;/p&gt;

&lt;p&gt;Goodman called this &lt;strong&gt;finite differentiation&lt;/strong&gt;: every element in the notation is distinctly separate, mathematically defined, impossible to confuse. The old failing schema suffered from the opposite, what Goodman called &lt;strong&gt;semantic density&lt;/strong&gt;: the boundary between a tag, a cost, and a requirement was all mushed together in one dense paragraph of prose, and a machine does not have the lived human experience required to unravel that density. The card grammar enforces the clean edges that formal systems need to compute.&lt;/p&gt;

&lt;h3 id=&quot;what-this-means-for-designers&quot;&gt;What This Means for Designers&lt;/h3&gt;

&lt;p&gt;When a designer says “I’m making an engine builder about breeding dinosaurs,” the system does not just generate flavor text about a T-Rex roaring. It proposes a specific card grammar for this new game: an anatomy layer with tags for carnivores and herbivores, a lifecycle layer where end-of-round events cause extinction triggers, and a synergy layer to handle a food chain production economy. The generated prototype cards carry these strict typed effects baked in.&lt;/p&gt;

&lt;p&gt;And crucially, because the system understands the underlying grammar, the balance testing engine can instantly simulate the mechanics. It will run hundreds of automated games and report: “Your Volcanic Eruption card is overpowered. Because of the specific synergy grammar, its Fire tag accidentally triggers an infinite resource loop with four other herbivore cards in the standard deck composition.”&lt;/p&gt;

&lt;p&gt;No other tool on the market can generate, simulate, balance-test, and export at that level of mechanical complexity. The structural card schema is the moat.&lt;/p&gt;

&lt;h3 id=&quot;how-we-actually-learn-games&quot;&gt;How We Actually Learn Games&lt;/h3&gt;

&lt;p&gt;What makes this architecture compelling is that it mirrors how human brains actually process complex board games. When you sit down to learn Terraforming Mars, you do not memorize the text on all 208 cards before playing. Instead, you spend the first 20 minutes learning the specific &lt;em&gt;grammar&lt;/em&gt; of that game’s universe: these icons mean production, those borders mean a one-time event, this phase happens before that phase.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/Human_Cognition.jpg&quot; alt=&quot;Mirroring Human Cognition at the Table&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Once your brain internalizes the grammar, someone can hand you a card you have never seen before. You would instantly know how to process it. You are running a mental card grammar simulator. The platform formalizes the same cognitive process.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-automated-playtesting-actually-reveals&quot;&gt;What Automated Playtesting Actually Reveals&lt;/h2&gt;

&lt;p&gt;With the card grammar solving the data structure problem, the balance-testing engine can finally do meaningful work on complex games. Running hundreds of simulated games with Monte Carlo Tree Search (MCTS), the same algorithm family behind AlphaGo, produces results that would take a human playtest group months to discover.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/MCTS_Playtesting.jpg&quot; alt=&quot;The Power of Automated MCTS Playtesting&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A 30-card engine builder prototype tested with MCTS showed a 90% skill gap: the strategic agent beat the random agent nine times out of ten. That number is a signal of economic depth. It means the production chains, resource conversions, and scoring paths create genuinely learnable strategy, not just lucky draws. A poorly designed prototype shows a 50-50 split between strategic and random play: the game has no meaningful decisions. The gap between 50% and 90% is the difference between a game that feels arbitrary and one that rewards mastery.&lt;/p&gt;

&lt;p&gt;But here is the honest limit. The balance engine can identify that one strategy wins 60% of all matchups. It can guarantee mathematical fairness. It can pinpoint the specific card that breaks the meta and explain &lt;em&gt;why&lt;/em&gt;: which tag triggers which cascade, which production chain dominates.&lt;/p&gt;

&lt;p&gt;It cannot measure fun.&lt;/p&gt;

&lt;p&gt;It cannot tell you if playing a particular card feels satisfying. It cannot simulate the tension of a close finish. It cannot quantify the social experience of bluffing your friend into a terrible trade. Goodman would say: any formal system must trade &lt;strong&gt;repleteness&lt;/strong&gt; (the full, dense richness of lived experience) for &lt;strong&gt;articulateness&lt;/strong&gt; (the sharp edges that computation requires). You cannot have both. A card database has sharp edges: this card costs 3, this strategy wins 60%. The experience of playing the game, the laughter, the agony of a misplay, is dense, contextual, and irreducibly human.&lt;/p&gt;

&lt;p&gt;The platform manages the articulate map. The designer navigates the replete territory. The platform reads the sheet music. The designer feels the orchestra.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;The card grammar solves the structural problem. Automated playtesting solves the iteration speed problem. But a designer who has already sketched 50 cards, playtested twice, and refined the core loop does not want the platform to &lt;em&gt;generate&lt;/em&gt; the game. They want the platform to &lt;em&gt;analyze&lt;/em&gt; the game. Import the rules, run 200 simulated games, and tell them which 12 cards are never played. That is the valuable work.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/New_Symbiosis.jpg&quot; alt=&quot;The New Symbiosis&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The platform handles the computational labor of balance testing, strategy validation, and rule clarity analysis. The designer provides the vision, the taste, and the judgment about what makes a game worth playing.&lt;/p&gt;

&lt;p&gt;The spreadsheet era is over. The technology to structurally understand, simulate, and balance complex card games is here. And the designers who thrive will be the ones who understand the difference between a game that is balanced and a game that is alive.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;This is Part 3 and the final article in the &lt;a href=&quot;three-waves-of-card-game-design-tools&quot;&gt;Card Architecture series&lt;/a&gt;. For the philosophical foundations behind this analysis, see &lt;a href=&quot;three-waves-of-card-game-design-tools&quot;&gt;Part 1: Three Waves of Card Game Design Tools&lt;/a&gt; and &lt;a href=&quot;how-ai-actually-designs-a-card&quot;&gt;Part 2: How AI Actually Designs a Card&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;appendix-the-12-games-in-the-stress-test&quot;&gt;Appendix: The 12 Games in the Stress Test&lt;/h2&gt;

&lt;p&gt;These are the card games we tested against the five-field schema, grouped by the coverage tier they fell into. If you are unfamiliar with any of them, the links lead to their BoardGameGeek pages, the definitive community resource for tabletop games.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/card-grammar-for-complex-card-games/CardGame_Collage_4_Tiers.jpg&quot; alt=&quot;Card Game All Tiers Examples&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;tier-a-full&quot;&gt;Tier A (Full)&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/36218/dominion&quot;&gt;Dominion&lt;/a&gt;&lt;/strong&gt; (Donald X. Vaccarino, 2008). The game that invented the deck-building genre. Players start with identical 10-card decks of weak cards and buy increasingly powerful cards from a shared market, shuffling purchases into their growing decks. Every card is a simple name-type-cost-effect tuple. The schema’s perfect match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/147020/star-realms&quot;&gt;Star Realms&lt;/a&gt;&lt;/strong&gt; (Robert Dougherty &amp;amp; Darwin Kastle, 2014). A two-player deck builder with faction-based synergies. Cards gain bonus abilities when played alongside other cards from the same faction, which stretches the schema slightly but does not break it. The conditional logic stays within effect text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/133473/sushi-go&quot;&gt;Sushi Go!&lt;/a&gt;&lt;/strong&gt; (Phil Walker-Harding, 2013). A lightweight card drafting game where players simultaneously pick cards from hands passed around the table. No resource costs at all. The cost field is null and the game works. Pure set collection scoring that the schema handles trivially.&lt;/p&gt;

&lt;h3 id=&quot;tier-b-partial&quot;&gt;Tier B (Partial)&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/68448/7-wonders&quot;&gt;7 Wonders&lt;/a&gt;&lt;/strong&gt; (Antoine Bauza, 2010). Card drafting across three ages with a chaining mechanism: building certain cards in earlier ages lets you build specific later cards for free. The schema has no field for age phasing or prerequisite chains, so the balance tester misses these cross-age interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/170216/blood-rage&quot;&gt;Blood Rage&lt;/a&gt;&lt;/strong&gt; (Eric M. Lang, 2015). A Viking-themed area control game with card drafting. Cards carry variable battle strength values and age-specific quest conditions that encode spatial and temporal scoring triggers, more than a single effect text string can cleanly represent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/262712/res-arcana&quot;&gt;Res Arcana&lt;/a&gt;&lt;/strong&gt; (Thomas Lehmann, 2019). A tight engine builder with only 8 cards per player. Each card converts specific essence types into other essences or victory points. The multi-resource conversion economy exceeds what a flat cost field can express.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/199792/everdell&quot;&gt;Everdell&lt;/a&gt;&lt;/strong&gt; (James A. Wilson, 2018). A tableau-building game where players place critters and constructions into a personal city. Cards have occupancy limits, seasonal availability, and cross-card pairing bonuses that the basic schema loses.&lt;/p&gt;

&lt;h3 id=&quot;tier-c-insufficient&quot;&gt;Tier C (Insufficient)&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/266192/wingspan&quot;&gt;Wingspan&lt;/a&gt;&lt;/strong&gt; (Elizabeth Hargrave, 2019). 170 unique bird cards, each carrying 8+ structured data fields: multi-type food costs, habitat placement restrictions, egg capacity, four distinct power trigger timings, and bonus trait tags for end-of-round scoring. The schema loses approximately 60% of each card’s mechanical data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/167791/terraforming-mars&quot;&gt;Terraforming Mars&lt;/a&gt;&lt;/strong&gt; (Jacob Fryxelius, 2016). 208 project cards encoding an entire economic subsystem: tag-driven cross-card synergies, game-state preconditions gating card play, a dual-track production/stockpile economy across six resource types, and three card colors with fundamentally different lifecycle behaviors. Approximately 70% data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/28143/race-for-the-galaxy&quot;&gt;Race for the Galaxy&lt;/a&gt;&lt;/strong&gt; (Thomas Lehmann, 2007). Every card serves triple duty as currency (discard to pay), tableau engine (ongoing production and consumption powers), and victory points (conditional end-game scoring formulas). The unified card economy where discarding a card to pay for another card &lt;em&gt;is&lt;/em&gt; the resource system has no representation in the basic schema.&lt;/p&gt;

&lt;h3 id=&quot;tier-d-breaks-down&quot;&gt;Tier D (Breaks Down)&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/257501/keyforge-call-of-the-archons&quot;&gt;KeyForge&lt;/a&gt;&lt;/strong&gt; (Richard Garfield, 2018). Every creature card has up to four distinct abilities on different timing triggers (play, reap, fight, destroyed), and the house-selection meta-mechanic replaces the entire concept of resource costs. The schema is structurally incompatible with the game’s design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/162886/spirit-island&quot;&gt;Spirit Island&lt;/a&gt;&lt;/strong&gt; (R. Eric Reuss, 2017). Power cards carry element symbols that accumulate across all cards played in a turn, unlocking threshold-gated innate abilities on the Spirit board. This cross-card element accumulation system (which resets every turn, unlike Terraforming Mars tags) has no schema representation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://boardgamegeek.com/boardgame/174430/gloomhaven&quot;&gt;Gloomhaven&lt;/a&gt;&lt;/strong&gt; (Isaac Childres, 2017). Every action card has two independent halves (top and bottom) separated by an initiative number. Players select two cards per turn, using the top of one and the bottom of the other. The combinatorial dual-half selection, initiative-based turn ordering, and permanent card loss as a stamina clock produce an 85% compression loss, the highest of any game tested.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Nelson Goodman, &lt;em&gt;Languages of Art: An Approach to a Theory of Symbols&lt;/em&gt;, Hackett Publishing, 1968.&lt;/li&gt;
  &lt;li&gt;Jesse Schell, &lt;em&gt;The Art of Game Design: A Book of Lenses&lt;/em&gt;, CRC Press, 3rd Edition, 2019.&lt;/li&gt;
  &lt;li&gt;Geoffrey Engelstein and Isaac Shalev, &lt;em&gt;Building Blocks of Tabletop Game Design&lt;/em&gt;, CRC Press, 2019.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://cdn.aaai.org/ojs/21550/21550-13-25563-1-2-20220628.pdf&quot;&gt;LUDUS: Auto Battler Card Balancing&lt;/a&gt;, AAAI 2022&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2502.07128&quot;&gt;Cardiverse: LLM Card Game Prototyping&lt;/a&gt;, EMNLP 2025&lt;/li&gt;
  &lt;li&gt;Alexandre Verlaine, “Introducing Card Games in Ludii,” UCLouvain Master’s Thesis, 2025.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Generative Ontology: From Game Knowledge to Game Creation&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;ai-playtesting-when-your-game-tests-itself&quot;&gt;AI Playtesting: When Your Board Game Tests Itself&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/card-grammar-for-complex-card-games</link>
        <guid isPermaLink="true">https://bennycheung.github.io/card-grammar-for-complex-card-games</guid>
        
        <category>Game Design</category>
        
        <category>Card Games</category>
        
        <category>Design Tools</category>
        
        <category>Tabletop Games</category>
        
        <category>Prototyping</category>
        
        <category>Game Architecture</category>
        
        <category>Philosophy</category>
        
        <category>Nelson Goodman</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>How AI Actually Designs a Card</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;In 2021, I spent a month reverse-engineering Race for the Galaxy. I parsed Keldon Jones’s C source code, converted the entire card library into Python, and mapped every phase interaction, every card power, every production chain across 114 unique cards. I did this because the game’s AI kept destroying me and I wanted to understand why. What I found was that every card in RFTG carries a structured data model far more complex than its printed text suggests: type, cost, VP value, good type, military flags, and a list of phase-specific powers that interact across five distinct game phases. Five years later, when I started building a system that generates card games, I realized the pipeline I needed was a mirror of what I had already done by hand. The AI was not replacing the designer’s process. It was formalizing it.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/post_cover.jpg&quot; alt=&quot;How AI Actually Designs a Card&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is Part 2 of the &lt;a href=&quot;three-waves-of-card-game-design-tools&quot;&gt;Card Architecture series&lt;/a&gt;. In Part 1, I traced the evolution of card game tools from scripting to AI-native pipelines. This article goes inside the pipeline itself. But rather than just describing how the pipeline works, I want to draw a parallel that changed how I think about tool-assisted design: at every stage, the AI is doing a mechanistic version of what a human designer already does. The question is not whether AI can design cards. It is which parts of card design are mechanistic, which parts are not, and what that means for the human designer’s role.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;how-a-human-designs-a-card&quot;&gt;How a Human Designs a Card&lt;/h2&gt;

&lt;p&gt;Before we look at the AI, let me describe what actually happens when a human designer sits down to create a card game. I will use Race for the Galaxy as the reference because I spent a month inside its architecture and because it represents the level of complexity that serious card games demand.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/RaceForTheGalaxy.gif&quot; alt=&quot;Race for the Galaxy&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1. Tom Lehmann’s Race for the Galaxy (2007) – 114 unique cards, five simultaneous phases, four production types, military vs civilian settlement. The complexity hiding inside each card is what makes it both a design masterpiece and an AI challenge.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When Tom Lehmann designed RFTG, the process was roughly this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, the world.&lt;/strong&gt; The game needed a theme that could sustain 114 unique cards. Galactic civilization building. Worlds to settle, technologies to develop, goods to produce and trade. The theme is not decoration. It constrains the design space. You cannot have a card called “Corporate Restructuring” in a game about medieval farming, and you cannot have “Harvest Festival” in a game about space colonization. Theme is the first filter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, the mechanics.&lt;/strong&gt; RFTG’s signature innovation is simultaneous role selection: all players secretly choose a phase, only chosen phases execute, choosers get a privilege bonus. This mechanic was not an afterthought. It was the skeleton that every card in the game hangs on. Each card carries phase-specific powers. New Vinland produces novelty goods in Phase 5 and consumes any good to draw 2 cards in Phase 4. That dual-phase interaction does not happen by accident. It happens because the designer defined the mechanical skeleton first, then designed cards that exploit its seams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, the cards themselves.&lt;/strong&gt; When I parsed the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cards.txt&lt;/code&gt; file, I found that every RFTG card carries a structured data model:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Field&lt;/th&gt;
      &lt;th&gt;Example (New Vinland)&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Name&lt;/td&gt;
      &lt;td&gt;New Vinland&lt;/td&gt;
      &lt;td&gt;Identity&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Type&lt;/td&gt;
      &lt;td&gt;World (Type 1)&lt;/td&gt;
      &lt;td&gt;Mechanical category&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Cost&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;What you pay (discard from hand)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;VP&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;End-game scoring&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Good Type&lt;/td&gt;
      &lt;td&gt;Novelty&lt;/td&gt;
      &lt;td&gt;What it produces&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Phase 4 Power&lt;/td&gt;
      &lt;td&gt;Consume any good, draw 2 cards&lt;/td&gt;
      &lt;td&gt;Trade/consume interaction&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Phase 5 Power&lt;/td&gt;
      &lt;td&gt;Produce good of world type&lt;/td&gt;
      &lt;td&gt;Production engine&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 1. The structured data model behind a single RFTG card. Seven fields, two phase-specific powers, one production chain. This is the complexity the basic schema must capture.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is seven structured fields on a single card. Replicant Robots, a development, has a different shape: cost 4, VP 2, and a Phase 3 power that reduces settlement cost by 2. Contact Specialist draws a card whenever you settle a world. Each card is a small program with inputs, outputs, and conditional behavior.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/RFTG_New_Vinland.png&quot; alt=&quot;RFTG New Vinland Card Design&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2. New Vinland’s card design data (left) alongside the actual card (right). The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cards.txt&lt;/code&gt; encoding – N:name, T:type:cost:vp, G:good type, P:phase:power – packs seven structured fields into six lines. Phase IV consumes any good to draw 2 cards. Phase V produces a novelty good. This is the structured data model hiding behind every RFTG card.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A human designer holds all of this in their head. They have an intuition for which cards the ecosystem needs, which strategic gaps exist, which combinations create satisfying turns. They know, from experience, that a deck full of cheap aggressive cards needs an expensive defensive counter, that a production chain needs both producers and consumers, that a game ending too quickly means the late-game investments are not worth building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fourth, iteration.&lt;/strong&gt; Lehmann did not get all 114 cards right on the first pass. He playtested, found broken combinations, removed cards, rebalanced costs, added new ones to fill strategic gaps. The RFTG AI was trained over 30,000 simulated games using temporal difference learning, way before DeepMind made reinforcement learning famous. The AI learned which cards win and which lose through sheer repetition. The iteration is where good cards become great cards.&lt;/p&gt;

&lt;p&gt;This four-stage process, theme then mechanics then cards then iteration, is what every experienced card game designer does. Some do it formally with design documents. Some do it in spreadsheets. But the cognitive structure is the same.&lt;/p&gt;

&lt;p&gt;The AI pipeline mirrors it exactly.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-multi-agent-pipeline&quot;&gt;The Multi-Agent Pipeline&lt;/h2&gt;

&lt;p&gt;Instead of asking one AI to do everything, the system splits the work across specialized agents, each handling one stage of the process a human designer does by instinct.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/Multi_Agent_Pipeline.jpg&quot; alt=&quot;Multi-Agent Pipeline&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3. The multi-agent card generation pipeline. Four specialized agents mirror the four stages of human card design: theme, mechanics, cards, and iteration.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;stage-1-the-theme-weaver&quot;&gt;Stage 1: The Theme Weaver&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;What the designer writes in a concept doc.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A human designer starts with a concept. “A game about galactic civilizations racing to explore, settle, and develop the galaxy.” They sketch the narrative boundaries: the vocabulary, the tone, the kinds of things that exist in this universe. Worlds, developments, goods, trade routes.&lt;/p&gt;

&lt;p&gt;The Theme Weaver agent does the same thing. It takes a sentence from the designer and generates a detailed thematic design document that locks in the narrative reality. If the game is about galactic expansion, the AI will not generate a card called “Wheat Field” or “Village Smithy.” The document constrains the vocabulary so every subsequent agent speaks the same language.&lt;/p&gt;

&lt;p&gt;A human does this unconsciously. The AI needs it written down.&lt;/p&gt;

&lt;h3 id=&quot;stage-2-the-mechanics-architect&quot;&gt;Stage 2: The Mechanics Architect&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;What the designer sketches as a turn structure.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After the theme, a human designer defines the physics of the cardboard universe. How many phases does a turn have? What resources exist? How do players interact? What triggers the end of the game?&lt;/p&gt;

&lt;p&gt;When I reverse-engineered RFTG’s game engine, I found that the entire game reduces to a state machine: five phases, each with a set of legal actions, each modifying a shared game state. Draw cards, develop, settle worlds, trade or consume goods, produce. The simultaneous role selection is the outer loop. The phase-specific card powers are the inner loop. Everything else is bookkeeping.&lt;/p&gt;

&lt;p&gt;The Mechanics Architect agent generates this same skeleton. It receives the thematic design and produces a mechanics document that defines the turn structure, the resource types, the victory conditions, and the action economy: what a player can physically do on their turn. This is the gravity of the game world. Every card the AI generates later will obey this gravity.&lt;/p&gt;

&lt;p&gt;The metaphor I keep coming back to: the mechanics architect builds the physics engine before asking the AI to design bridges.&lt;/p&gt;

&lt;h3 id=&quot;stage-3-the-component-designer&quot;&gt;Stage 3: The Component Designer&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;What the designer types into a spreadsheet.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With both the theme and the mechanics in hand, the Component Designer agent generates the actual cards. This is the stage where the parallel between human and AI becomes most striking.&lt;/p&gt;

&lt;p&gt;Every card the AI generates must conform to a strict schema:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Field&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
      &lt;th&gt;RFTG Equivalent&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Card name&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Identity, must fit the theme&lt;/td&gt;
      &lt;td&gt;“New Vinland,” “Contact Specialist”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Card type&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Mechanical category&lt;/td&gt;
      &lt;td&gt;World vs. Development&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Effect text&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;What the card does, as printed&lt;/td&gt;
      &lt;td&gt;Phase-specific powers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;What the player pays&lt;/td&gt;
      &lt;td&gt;Discard cards from hand&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Strategic role&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Why this card exists (min 20 characters)&lt;/td&gt;
      &lt;td&gt;The designer’s mental model&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Table 2. The five-field card schema. The strategic role field externalizes what human designers carry as intuition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The last field, strategic role, is the most important. It has a hard minimum of 20 characters. The AI must write an explanation for every card it generates, justifying why the card exists in the game’s ecosystem.&lt;/p&gt;

&lt;p&gt;Here is the thing: every experienced card game designer carries this justification in their head. They know that New Vinland exists to be a cheap entry point into the novelty production chain. They know that Contact Specialist exists to reward players who invest in settling worlds. They know that Galactic Federation exists to create a scoring payoff for development-heavy strategies.&lt;/p&gt;

&lt;p&gt;The difference is that human designers hold this mental model implicitly, and sometimes lose track of it at card 47. The strategic role field forces the AI to make it explicit. And it turns out, forcing anyone to articulate &lt;em&gt;why&lt;/em&gt; each card exists makes the design better, whether the designer is human or artificial.&lt;/p&gt;

&lt;h3 id=&quot;the-new-vinland-test&quot;&gt;The New Vinland Test&lt;/h3&gt;

&lt;p&gt;To see this in practice, consider what happens when a game with RFTG-style mechanics runs through the pipeline. The AI does not generate a generic “World Card: worth 1 VP.” It has full context about the theme (galactic civilizations), the mechanics (five-phase simultaneous selection with production chains), and the strategic ecosystem.&lt;/p&gt;

&lt;p&gt;So it generates something like: “Mining World. Civilian world. Produces rare goods. Cost 3 cards. Strategic role: Mid-cost production world. Rare goods are more valuable in trade, creating a payoff for the higher investment compared to cheaper novelty worlds.”&lt;/p&gt;

&lt;p&gt;That strategic role statement is the proof that the AI understands the resource hierarchy. Rare goods trade for more than novelty goods. A 3-cost world that produces rare goods is correctly positioned above a 2-cost world that produces novelty goods. The AI is reasoning about the same cost curve that a human designer would sketch in a spreadsheet.&lt;/p&gt;

&lt;p&gt;It is not generic mush. It is a card that fits the economic structure of the game.&lt;/p&gt;

&lt;h3 id=&quot;beyond-the-five-fields-card-grammar&quot;&gt;Beyond the Five Fields: Card Grammar&lt;/h3&gt;

&lt;p&gt;The five-field schema works for many card games. But when I went back to RFTG’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cards.txt&lt;/code&gt; and counted the fields per card, I found seven, eight, sometimes more: type, cost, VP, good type, flags, and multiple phase-specific powers. The five-field schema is the floor, not the ceiling.&lt;/p&gt;

&lt;p&gt;For complex games, the AI pipeline supports a Card Grammar: a per-game anatomy declaration that tells every agent exactly what structured fields each card carries. Instead of free-form effect text that the system has to parse, the Card Grammar declares typed fields:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Tags (enum list): Science, Military, Production, Alien&lt;/li&gt;
  &lt;li&gt;Production effects (resource delta): produces 2 ore per round&lt;/li&gt;
  &lt;li&gt;Resource cost (resource map): costs 3 ore + 2 energy&lt;/li&gt;
  &lt;li&gt;Trigger timing (enum): when played, when activated, between turns, game end&lt;/li&gt;
  &lt;li&gt;Scoring formula (formula): 3 VP per Military tag in tableau&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the structure I found inside RFTG’s card data, generalized to work for any game. The Card Grammar tells the AI: “In this game, cards have tags, production rates, resource costs, and trigger timing. Generate cards that fill these fields.” The result is structured data, not prose, which means the simulation engine can read it directly without guessing.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/Card_Grammar_Schema.jpg&quot; alt=&quot;Card Grammar Schema&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4. The Card Grammar extends the five-field schema with typed anatomy fields specific to each game. The same structure I extracted manually from RFTG, the system now declares and enforces automatically.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;stage-4-the-detail-expander&quot;&gt;Stage 4: The Detail Expander&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;What the designer does after the first playtest.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is a problem that every card game designer knows intimately. You design 30 cards. You playtest. They are all fine. Balanced. Functional. And completely boring.&lt;/p&gt;

&lt;p&gt;A language model has the same tendency. Left to its own devices, it regresses to the mean. It produces safe, statistically average cards that are all roughly the same power level, the same cost range, the same complexity. Functional and forgettable.&lt;/p&gt;

&lt;p&gt;A human designer fixes this after the first playtest. They realize the aggro strategy is too strong, so they design a trap card. They notice the late game stalls out, so they add a high-cost bomb that rewards patience. They find that no one is building military because the payoff is not high enough, so they add a 6-cost development that scores 3 VP per Military tag in the tableau.&lt;/p&gt;

&lt;p&gt;The Detail Expander agent does the same thing. After the foundational cards are generated, it looks at the batch and deliberately breaks the mold:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;At least one combo card that chains multiple mechanics together&lt;/li&gt;
  &lt;li&gt;At least one situational card that is weak in most games but devastating in the right context&lt;/li&gt;
  &lt;li&gt;At least one expensive late-game card that rewards long-term investment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In RFTG terms: the foundational batch might produce a set of balanced 2-3 cost worlds. The detail expander would then generate Galactic Federation (6-cost, scores 2 VP per Development tag) or Pan-Galactic League (6-cost, scores 3 VP per Military tag). These are the cards that create divergent strategies. They do not emerge from averaging. They emerge from deliberately forcing outliers.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-five-card-generation-rule&quot;&gt;The Five-Card Generation Rule&lt;/h2&gt;

&lt;p&gt;After the pipeline finishes its initial run, you have a micro-deck of five to eight diverse, interlocking cards. A proof of concept. But a real game needs 30, 50, even 100 cards. Scaling up introduces a problem that anyone who uses language models will recognize: context degradation.&lt;/p&gt;

&lt;p&gt;If you ask an AI for three things, it is brilliant. If you ask it for 50, it starts strong, but by item 14 it has forgotten the constraints it was given 12 items ago.&lt;/p&gt;

&lt;p&gt;The system enforces a hard rule: never generate more than five cards in a single request. When scaling to a 50-card deck, the system executes a batch loop: 10 sequential requests for five cards each.&lt;/p&gt;

&lt;p&gt;Think of it like asking a friend for restaurant recommendations. “Give me your top three” produces three brilliant, curated picks. “Name 50 restaurants” produces panic and a list of every chain in a ten-mile radius. The language model’s attention span works the same way.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/Five_Card_Batch_Loop.jpg&quot; alt=&quot;Five Card Batch Loop&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5. The batch loop generates five cards at a time. Each batch sees the full existing card list, identifies strategic gaps, and fills them. Later batches naturally evolve to counter earlier ones.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But the five-card rule does something more interesting than just maintaining quality. It creates &lt;strong&gt;progressive improvement&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each batch analyzes the existing card list before generating. If the first two batches produced cheap, aggressive cards, the third batch notices the imbalance and generates high-cost defensive cards to compensate. Later batches fill strategic gaps left by earlier ones. The AI might invent counter-strategies to the cards it generated three minutes prior.&lt;/p&gt;

&lt;p&gt;This is exactly what happens during human playtesting. A designer plays a few hands, finds that the rush strategy dominates, goes back to their desk and designs a card that slows it down. The batch loop compresses that iteration cycle from weeks to minutes, but the cognitive structure is the same: observe the ecosystem, identify the gap, design the counter.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-the-ai-can-measure&quot;&gt;What the AI Can Measure&lt;/h2&gt;

&lt;p&gt;Before I talk about what the AI cannot do, let me be specific about what it can.&lt;/p&gt;

&lt;p&gt;After generating a deck, the system runs Monte Carlo Tree Search simulations: hundreds of games where AI agents play the deck against itself. MCTS is not a language model. It is a planning algorithm that explores decision trees to find winning strategies.&lt;/p&gt;

&lt;p&gt;On a 30-card engine builder prototype I designed as a test, the MCTS agent learned to buy production cards early, build conversion infrastructure mid-game, and buy scoring cards late. It won 90% of games against a random player. The AI did not just balance the cards. It discovered the strategy the designer intended.&lt;/p&gt;

&lt;p&gt;That 90% win rate is a meaningful signal. It tells the designer: “Your resource economy has strategic depth. There is a learnable curve. The production chain works.” If the MCTS win rate were 50%, it would mean the cards are strategically interchangeable, there is nothing to learn, and the economy is flat.&lt;/p&gt;

&lt;p&gt;This is the mechanical side of card design, and the AI handles it well. Cost curves, resource balance, dominant strategy detection, statistical fairness. The machine can play a thousand games and tell you that Card A wins 60% of matchups and the military strategy needs a stronger payoff.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-the-ai-cannot-do&quot;&gt;What the AI Cannot Do&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/how-ai-actually-designs-a-card/What_AI_Cannot_See.jpg&quot; alt=&quot;What AI Sees vs What AI Cannot See&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6. The gap between what the system can measure (structural dimensions, balance metrics, fun scores) and what it cannot see (four friends laughing around a table). The bridge between them is you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But it cannot measure fun.&lt;/p&gt;

&lt;p&gt;I explored this question in depth in &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;The Theory of Generative Board Game Design&lt;/a&gt;, and the conclusion has only sharpened since. AI cannot &lt;em&gt;experience&lt;/em&gt; fun. It has never felt the excitement of a close finish, the satisfaction of a clever combo, or the social electricity of pulling off a bluff. It has no taste. It has no feelings.&lt;/p&gt;

&lt;p&gt;But here is the nuance: you do not need to feel fun to recognize the &lt;em&gt;design patterns&lt;/em&gt; that produce it. Hidden information creates tension. Multiple real options create agency. Escalating stakes create drama. The AI can detect these structural ingredients from your turn structure and mechanism choices. A bridge engineer does not need to feel beauty to know the math that makes a bridge elegant.&lt;/p&gt;

&lt;p&gt;What the AI cannot do is predict the alchemy. The same mechanic that creates delicious tension for one group might fall flat for another. An algorithm can guarantee mathematical fairness. It can ensure that no single strategy breaks the ecosystem. It can detect that a card is overpowered by win rate, or that a production chain stalls in the mid-game, or that the military path is under-rewarded.&lt;/p&gt;

&lt;p&gt;It cannot tell you if playing that card feels satisfying. It cannot measure the tension of a close game where both players reach for the same phase on the same turn. It cannot simulate the joy of building a production engine that suddenly clicks into gear on round five, or the agony of watching your opponent settle Rebel Base when you were one military strength short.&lt;/p&gt;

&lt;p&gt;When I played against the RFTG AI, the losses were not frustrating because the AI played optimally. They were frustrating because the &lt;em&gt;game&lt;/em&gt; created situations where the optimal play produced dramatic outcomes. The AI chose the Produce phase at exactly the moment when my production worlds were loaded and my opponent’s were empty. The AI consumed goods for VP on the turn that pushed it past the threshold. The AI did not design those moments. Tom Lehmann did. The AI just played them.&lt;/p&gt;

&lt;p&gt;The math ensures the game works. The human designer ensures the game is worth playing. That distinction, between a game that is balanced and a game that is memorable, is where the human role endures.&lt;/p&gt;

&lt;p&gt;In my 2021 series, I mapped every component of the RFTG architecture: the &lt;a href=&quot;game-architecture-card-ai-1&quot;&gt;game model&lt;/a&gt;, the &lt;a href=&quot;game-architecture-card-ai-2&quot;&gt;action engine&lt;/a&gt;, the &lt;a href=&quot;game-architecture-card-ai-3&quot;&gt;neural network AI&lt;/a&gt;. I could formalize everything except the feeling of a close game. Five years later, building an AI system that generates card games, that gap has not closed. It has only become more precisely defined.&lt;/p&gt;

&lt;p&gt;The useful question is not “Can AI understand fun?” but “Can AI spot the design patterns that tend to produce fun, so you can focus your energy on the parts only a human designer can provide?” The answer to the first is no. The answer to the second is yes. And that is the only question that matters.&lt;/p&gt;

&lt;p&gt;The AI manages the map. The human hikes the territory. The map can tell you which paths exist, which are efficient, which lead to dead ends. But it cannot tell you which path has the view that makes you stop and stare. That is the designer’s job, and it always will be.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;Next in the series: When the Schema Breaks – where we stress-test the card schema against 13 famous games, find out which ones break it completely, and discover that the gap between a balanced game and a memorable one is the most important problem in tool-assisted design.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Thomas Lehmann, &lt;a href=&quot;https://boardgamegeek.com/boardgame/28143/race-galaxy&quot;&gt;Race for the Galaxy&lt;/a&gt;, Rio Grande Games, 2007.&lt;/li&gt;
  &lt;li&gt;Keldon Jones, &lt;a href=&quot;https://github.com/bnordli/rftg&quot;&gt;RFTG AI Source Code&lt;/a&gt; (C, GPLv2), 2009.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-1&quot;&gt;Game Architecture for Card Game Model (Part 1)&lt;/a&gt;, bennycheung.github.io, 2021.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-2&quot;&gt;Game Architecture for Card Game Action (Part 2)&lt;/a&gt;, bennycheung.github.io, 2021.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-3&quot;&gt;Game Architecture for Card Game AI (Part 3)&lt;/a&gt;, bennycheung.github.io, 2021.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/three-waves-of-card-game-design-tools&quot;&gt;Three Waves of Card Game Design Tools&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;The Theory of Generative Board Game Design&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Generative Ontology: From Game Knowledge to Game Creation&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
  &lt;li&gt;Benny Cheung, &lt;a href=&quot;https://bennycheung.github.io/ai-playtesting-when-your-game-tests-itself&quot;&gt;AI Playtesting: When Your Board Game Tests Itself&lt;/a&gt;, bennycheung.github.io, 2026.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2502.07128&quot;&gt;Cardiverse: LLM Card Game Prototyping&lt;/a&gt;, EMNLP 2025.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://store.steampowered.com/app/286160/&quot;&gt;Tabletop Simulator&lt;/a&gt;, Berserk Games.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.thegamecrafter.com/&quot;&gt;The Game Crafter&lt;/a&gt;, Print-on-demand for tabletop games.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/how-ai-actually-designs-a-card</link>
        <guid isPermaLink="true">https://bennycheung.github.io/how-ai-actually-designs-a-card</guid>
        
        <category>Game Design</category>
        
        <category>Card Games</category>
        
        <category>Design Tools</category>
        
        <category>Tabletop Games</category>
        
        <category>Prototyping</category>
        
        <category>Game Architecture</category>
        
        <category>Race for the Galaxy</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Three Waves of Card Game Design Tools</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;I am a software architect by profession, but a game designer at heart. When I first looked at how card games get made, I recognized the pain immediately. Hundreds of interdependent data fields living in fragile spreadsheets. Manual rendering pipelines where a three-pixel change means recompiling an entire deck. Hours of tedious formatting before you can even test whether the game is fun. As a programmer, this kind of repetitive, error-prone manual process is exactly the thing I have spent my entire career building tools to eliminate. Behind every elegant piece of cardboard is a staggering web of math, probability, edge case testing, and tedious layout formatting. Over the past 15 years, the tools available to card game designers have gone through three distinct waves of evolution. This article traces that arc from the scripting trenches of the mid-2000s to the AI-native pipelines of 2026.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;This is Part 1 of the Card Architecture series. My interest in card game architecture is not new. Back in 2021, I spent a month reverse-engineering &lt;em&gt;Race for the Galaxy&lt;/em&gt;, dissecting its &lt;a href=&quot;game-architecture-card-ai-1&quot;&gt;game model&lt;/a&gt;, &lt;a href=&quot;game-architecture-card-ai-2&quot;&gt;action engine&lt;/a&gt;, and &lt;a href=&quot;game-architecture-card-ai-3&quot;&gt;neural network AI&lt;/a&gt;. That deep dive taught me how much hidden complexity lives inside a well-designed card game, how tightly the mechanics, the card interactions, and the AI decision-making are coupled together. It also left me frustrated with how manual the entire design and prototyping process remained.&lt;/p&gt;

&lt;p&gt;Programmers are lazy in the best possible way: we hate repeating ourselves, and we will spend a week automating a task that takes ten minutes, purely out of principle. That instinct, combined with what I learned from studying Race for the Galaxy’s architecture, is what pulled me into building AI-native game design tools over the past year. But the deeper I got, the more I realized that game design is not a simpler version of software design. It is a different medium with its own complexity, its own craft, and its own hard-won expertise. This series is my attempt to make sense of that world. Subsequent articles will cover multi-agent card generation, the schema limits exposed by famous games like Wingspan and Terraforming Mars, the export pipeline from data to playable prototype, and the algorithms that draw board game maps.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-minor-miracle-of-a-finished-deck&quot;&gt;The Minor Miracle of a Finished Deck&lt;/h2&gt;

&lt;p&gt;When you hold a finished card game in your hands, you are holding a minor miracle. Every card in that deck has to talk to every other card. The costs have to scale with the power. The combos have to exist without being degenerate. The types have to distribute across the deck so that no strategy completely dominates. And every single piece of rules text has to be unambiguous enough that two strangers can sit down and agree on what it means.&lt;/p&gt;

&lt;p&gt;If even one number is off, the whole ecosystem collapses. A card that costs one resource too little warps the meta. A combo that the designer missed creates an unbeatable strategy that players discover on their second game night. A piece of ambiguous text spawns a 200-comment thread on BoardGameGeek about whether “adjacent” includes diagonals.&lt;/p&gt;

&lt;p&gt;For decades, the barrier to entry in game design was not having a good idea. Good ideas are everywhere. The barrier was having the sheer clerical stamina to manage the data. Hundreds of cards, each with five to ten interdependent fields, all living in a spreadsheet that grows more fragile with every edit. In software, we would call this accidental complexity: difficulty that comes from the tools, not from the problem itself. The history of card game design tools is the history of chipping away at that accidental complexity.&lt;/p&gt;

&lt;p&gt;That history falls into three clear waves.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_Overview.jpg&quot; alt=&quot;Three Waves of Card Game Design Tools&quot; /&gt;
&lt;em&gt;Figure 1. The three waves of card game design tools: from scripting and spreadsheets, to visual editors, to AI-native pipelines that understand your game.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;wave-1-the-template-era-2006-2010s&quot;&gt;Wave 1: The Template Era (2006-2010s)&lt;/h2&gt;

&lt;h3 id=&quot;scripts-spreadsheets-and-pixel-coordinates&quot;&gt;Scripts, Spreadsheets, and Pixel Coordinates&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_Template_Era.jpg&quot; alt=&quot;The Template Era&quot; /&gt;
&lt;em&gt;Figure 2. Wave 1: the designer’s reality, surrounded by spreadsheets, scripts, and pixel coordinates, wrestling data into cards by brute force.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The first wave of dedicated card game tools began with nanDECK [1], a free Windows scripting language released in 2006, and matured through the 2010s with tools like Squib [2] (an open-source Ruby framework, 2014) and CardPen [5] (a browser-based HTML/CSS generator). These tools were a real step up from doing everything by hand in Photoshop, but using them felt less like game design and more like software engineering. As someone who writes code for a living, I can appreciate that. But I also know that forcing non-programmers into a code-first workflow is a classic product design mistake.&lt;/p&gt;

&lt;p&gt;In this era, a card game was treated purely as a layout problem. All of your game data lived in a massive Excel spreadsheet or a CSV file. Row one was your basic attack card. Row two was your defense card. Row 150 was your ultimate boss monster. Column A was the name. Column B was the cost. Column C was the rules text. And so on, for as many columns as your game demanded.&lt;/p&gt;

&lt;p&gt;The Wave 1 tools acted as mail merge on steroids. You wrote a script that said: take the text from column A and print it in a 24-point font at these exact x and y pixel coordinates on the image canvas. Take the number from column B and render it inside this icon template at position (45, 120). Repeat for every row in the spreadsheet. If you needed to move a cost icon three pixels to the left, you opened a code editor, changed an x-coordinate from 45 to 42, recompiled the entire 200-card deck, and prayed it looked right. A card with a title slightly too long? You wrote a conditional statement in your code to shrink the font for that one card. Every visual problem required a programming solution.&lt;/p&gt;

&lt;p&gt;The tradeoff was total control. Every pixel was deterministic. Every layout decision was reproducible. Run the script twice with the same data, get the exact same output. For veteran designers who valued precision over convenience, this was worth the pain.&lt;/p&gt;

&lt;h3 id=&quot;what-wave-1-could-not-do&quot;&gt;What Wave 1 Could Not Do&lt;/h3&gt;

&lt;p&gt;But for all their rendering power, Wave 1 tools were blind to the game itself. nanDECK could not tell you if your cards were fun to play, if your pirate-themed deck builder had a dominant strategy, or if your resource curve was broken. It knew nothing about your game. It just knew how to print a spreadsheet.&lt;/p&gt;

&lt;p&gt;The content, the actual game design, lived entirely in the designer’s head and in the rows of that spreadsheet. The tool did not care what you were making. It just printed whatever you told it to print.&lt;/p&gt;

&lt;p&gt;As a programmer, I actually find Wave 1 tools strangely appealing. Reading through BGG forums, you find nanDECK veterans who swear by the total control it gives them. Squib users on GitHub talk about version-controlling their card data alongside their code, which is something most GUI tools still cannot do cleanly. If you already think in code, this workflow makes perfect sense. But that is exactly the problem: most game designers are not programmers. They are people with brilliant ideas about player interaction, narrative tension, and strategic depth, who should not need to learn Ruby to make a prototype.&lt;/p&gt;

&lt;p&gt;By the mid-2020s, the tool landscape had diversified into four distinct quadrants, each solving a different piece of the design-to-table pipeline.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_Tool_Landscape.jpg&quot; alt=&quot;The 2026 Tool Landscape&quot; /&gt;
&lt;em&gt;Figure 3. The 2026 tool landscape across four quadrants: scriptable developer tools (nanDECK, Squib, CardPen), WYSIWYG GUI tools (Component.Studio, Dextrous, Tabletop Creator), print-on-demand services (The Game Crafter, MakePlayingCards), and digital prototyping platforms (Tabletop Simulator, Screentop, Tabletopia).&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;wave-2-the-gui-era-late-2010s-2020s&quot;&gt;Wave 2: The GUI Era (Late 2010s-2020s)&lt;/h2&gt;

&lt;h3 id=&quot;visual-editors-and-export-pipelines&quot;&gt;Visual Editors and Export Pipelines&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_GUI_Era.jpg&quot; alt=&quot;The GUI Era&quot; /&gt;
&lt;em&gt;Figure 4. Wave 2: visual editors with one-click export pipelines for Tabletop Simulator, print-on-demand, and PDF.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Starting with Component.Studio [3] in 2017 and accelerating through the early 2020s with Dextrous [4] and Tabletop Creator, a new generation of tools introduced something that seems obvious in retrospect but changed everything: visual editors.&lt;/p&gt;

&lt;p&gt;For the first time, card game designers could drag and drop text boxes onto a visual canvas instead of calculating x and y coordinates in their heads. They could see the card as they were building it. They could resize elements with a mouse, preview the deck in real time, and iterate on the layout without touching a line of code.&lt;/p&gt;

&lt;p&gt;It was basically desktop publishing software, but specifically built for tabletop games. InDesign for nerds. Component.Studio’s killer feature was its Google Sheets integration, which meant your game data and your visual layout stayed in sync automatically. Dextrous took a different approach, focusing on a polished local editing experience with strong TTS export. Tabletop Creator landed on Steam and attracted designers who wanted an all-in-one desktop app rather than a browser tool. Each had tradeoffs, but all of them eliminated the x-y coordinate problem overnight.&lt;/p&gt;

&lt;h3 id=&quot;solving-the-deployment-problem&quot;&gt;Solving the Deployment Problem&lt;/h3&gt;

&lt;p&gt;Beyond the visual editor, Wave 2 tools solved a second major pain point: deployment. In software product terms, this is the “last mile” problem, getting the finished work out of the development environment and into the hands of actual users. For card game designers, the last mile had always been awkward. You would export a folder full of raw image files and then spend hours manually importing them into Tabletop Simulator [6], formatting them for print-on-demand services, or arranging them into printable sheets for home prototyping.&lt;/p&gt;

&lt;p&gt;Wave 2 tools built direct export pipelines. A single button press could format your deck for Tabletop Simulator, send it to The Game Crafter [7] for physical printing, or generate a PDF laid out for standard card sleeves. The friction between “I finished designing” and “I am playing the game” collapsed from days to minutes.&lt;/p&gt;

&lt;p&gt;For the prototyping workflow, this was enormous. The traditional cycle of export, format, upload, test, realize the game is broken, cry, and repeat was dramatically shortened. Designers could iterate faster than ever.&lt;/p&gt;

&lt;h3 id=&quot;the-blank-page-remained&quot;&gt;The Blank Page Remained&lt;/h3&gt;

&lt;p&gt;Wave 2 made formatting easier, exporting seamless, and iteration faster. But the content bottleneck was identical to Wave 1. You were still staring at a spreadsheet, hand-typing 200 rows of card data, and trying to balance the math in your own head. The tool could make your cards look professional and get them onto a table fast, but it could not help you figure out what the cards should actually do.&lt;/p&gt;

&lt;p&gt;If Wave 1 gave a writer a clunky printing press, Wave 2 gave them Microsoft Word. The output was prettier, but the writer still had to write the book. One designer on the Board Game Design Lab forum described spending an entire Saturday getting a card layout pixel-perfect in a GUI editor, only to realize the underlying card design was broken because the resource curve was wrong. The tool could not tell him that. Nothing could, except playtesting.&lt;/p&gt;

&lt;h2 id=&quot;wave-3-the-ai-native-era-2025&quot;&gt;Wave 3: The AI-Native Era (2025+)&lt;/h2&gt;

&lt;h3 id=&quot;automating-the-invention&quot;&gt;Automating the Invention&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_AI_Native.jpg&quot; alt=&quot;The AI-Native Era&quot; /&gt;
&lt;em&gt;Figure 5. Wave 3: the system reads your game’s ontology and generates balanced cards with mechanics, costs, and synergies. The tool becomes a design partner.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Wave 3 breaks the pattern entirely. Tools emerging in 2025 and 2026 are no longer automating the layout or the export. They are automating the invention.&lt;/p&gt;

&lt;p&gt;The spreadsheet is removed entirely as the source of truth. Instead, the foundation of your game becomes a structured game description, known in the research literature as an ontology. If you want to understand the ontology approach in depth, I wrote a deep dive on the topic in &lt;a href=&quot;generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Generative Ontology: From Game Knowledge to Game Creation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;An ontology is a machine-readable map of your game’s DNA. It defines the core reality of your game: the theme, the primary mechanics, the resources players use, the turn phases, how victory is achieved, and how all those elements interact with each other.&lt;/p&gt;

&lt;p&gt;You feed this DNA map into the system, and it generates the spreadsheet for you. It is not just laying out the cards. It is authoring the rules text, the costs, and the specific mechanics based on the laws of the universe you defined.&lt;/p&gt;

&lt;p&gt;To return to the writer analogy: Wave 3 gives the writer a co-author who deeply understands their genre and can draft chapters for them. The writer still sets the vision and the direction. The co-author handles the labor of turning that vision into pages.&lt;/p&gt;

&lt;h3 id=&quot;the-migration-of-value&quot;&gt;The Migration of Value&lt;/h3&gt;

&lt;p&gt;This shift changes where the competitive advantage lies:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Wave&lt;/th&gt;
      &lt;th&gt;Era&lt;/th&gt;
      &lt;th&gt;Competitive Advantage&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Wave 1&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;2006-2010s&lt;/td&gt;
      &lt;td&gt;Rendering quality: pixel-perfect card images from data&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Wave 2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Late 2010s-2020s&lt;/td&gt;
      &lt;td&gt;Export breadth: how quickly cards reach TTS or print&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Wave 3&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;2025+&lt;/td&gt;
      &lt;td&gt;Content intelligence: the tool understands your game’s strategy&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;A Wave 1 or Wave 2 tool does not know that your pirate-themed deck builder desperately needs a counter to a dominant “hoard gold” strategy. To those tools, a card is just a string of characters. A Wave 3 tool reads the ontology, recognizes the resource economy, and generates cards that participate in balancing the ecosystem. Whether it generates &lt;em&gt;good&lt;/em&gt; cards is a separate question, and one I will be honest about in the next article in this series.&lt;/p&gt;

&lt;h3 id=&quot;what-changed-to-make-this-possible&quot;&gt;What Changed to Make This Possible&lt;/h3&gt;

&lt;p&gt;Three technical shifts converged to enable Wave 3:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Large language models became reliable enough for structured output.&lt;/strong&gt; Early LLMs could write creative text, but they could not consistently produce data in a rigid schema. By 2025, models like Claude and GPT-4 could reliably output structured JSON with specific fields, types, and constraints, making them usable as data generators, not just text generators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent architectures replaced single prompts.&lt;/strong&gt; Asking one AI to “make a card game” produces generic mush. But splitting the task across specialized agents (one for theme, one for mechanics, one for card generation, one for balance testing) produces output with real mechanical depth. Each agent focuses on what it does best, and the pipeline enforces quality at every stage. Cardiverse [12], an academic system presented at EMNLP 2025, demonstrated this approach with graph-based mechanic indexing and self-play validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated playtesting became computationally feasible.&lt;/strong&gt; Algorithms like Monte Carlo Tree Search, the same family that powered AlphaGo, can now simulate thousands of games in minutes. For simple card games like deck builders and drafting games, where card effects map cleanly to executable actions (draw, gain resource, force discard), a generated deck can be stress-tested for dominant strategies before a human touches it. For more complex engine builders like Terraforming Mars or Wingspan, the technology can validate card data structure and trigger timing, but fully simulating the game-specific economy (resource production, trading chains, cards-as-currency) remains an active area of development. The gap between “the AI understands your card grammar” and “the AI plays your game” is real, and I will be honest about it in a later article in this series. For a deep dive into how the simulation works today, see &lt;a href=&quot;ai-playtesting-when-your-game-tests-itself&quot;&gt;AI Playtesting: When Your Board Game Tests Itself&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These three shifts did not arrive as a coherent plan. They converged messily, the way most real technological transitions do.&lt;/p&gt;

&lt;h2 id=&quot;what-this-means-for-designers&quot;&gt;What This Means for Designers&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/three-waves-of-card-game-design-tools/Three_Waves_Designer_Role.jpg&quot; alt=&quot;The Designer&apos;s Evolving Role&quot; /&gt;
&lt;em&gt;Figure 6. The designer’s role evolves: from spreadsheet manager to creative visionary, with tools handling the data while humans define the vision.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;from-spreadsheet-manager-to-creative-director&quot;&gt;From Spreadsheet Manager to Creative Director&lt;/h3&gt;

&lt;p&gt;From what I have gathered talking to designers and reading their forum posts, being a card game designer has meant being a spreadsheet manager first and a creative visionary second. The bulk of the time goes to data entry, layout formatting, and manual balance calculations. The actual design work, choosing mechanics, crafting theme, engineering the moments that make a game memorable, gets squeezed into whatever time is left. Any product designer would recognize this as the same problem we see in enterprise software: users spending most of their time fighting the tool instead of doing their actual job.&lt;/p&gt;

&lt;p&gt;Wave 3 inverts that ratio. The machine handles the data entry, the layout, the initial balance pass, and the formatting. The designer gets to focus on what they actually care about: what the game should feel like, what decisions should be agonizing, what moments should make players laugh or groan or slam the table.&lt;/p&gt;

&lt;p&gt;I want to be honest about the tension here. A lot of designers I have talked to in the community are deeply skeptical of AI in game design, and they have good reasons. There is a real risk that AI-generated content floods the market with mediocre games that technically work but have no soul. I have generated card sets that were mechanically balanced and completely boring to play. Balance is not the same as fun, and I learned that the hard way.&lt;/p&gt;

&lt;p&gt;But as someone who has spent decades in software engineering, I see a pattern I have seen before. Programmers went through the same anxiety when IDEs started auto-completing code, when Stack Overflow made every answer searchable, when GitHub Copilot started writing functions for us. Every time, the fear was that the craft would be devalued. Every time, what actually happened was that the tedious parts got automated and the creative parts became more important. The engineers who understood &lt;em&gt;why&lt;/em&gt; they were building something became more valuable, not less. The ones who only knew &lt;em&gt;how&lt;/em&gt; to type the code had a harder time.&lt;/p&gt;

&lt;p&gt;I think game design is heading for the same split. The designer who understands game systems, who can sense that the midgame needs a crisis event to prevent stalling, who knows why a mechanic creates tension at the table even though the math says it is balanced, that person’s expertise matters more than ever. The machine can generate 200 cards. It has no idea which 200 cards the game actually needs. That is still the designer’s call.&lt;/p&gt;

&lt;h3 id=&quot;the-new-skill-describing-constraints&quot;&gt;The New Skill: Describing Constraints&lt;/h3&gt;

&lt;p&gt;What has changed is the skill the designer needs to bring to the table. In Wave 1, the critical skill was scripting. In Wave 2, it was visual layout. In Wave 3, the critical skill is the ability to describe the constraints of your game with precision.&lt;/p&gt;

&lt;p&gt;The AI’s output is only as good as the ontology you feed it. A vague description produces vague cards. A precise, well-structured description of your mechanics, resources, turn phases, and victory conditions produces cards that interlock, synergize, and challenge.&lt;/p&gt;

&lt;p&gt;The designer’s job is shifting from “fill in the spreadsheet” to “define the universe.” In software, we went through the same transition when we moved from writing assembly code to writing high-level specifications. The abstraction level rose, but the design work got harder, not easier. I suspect game design is heading the same direction.&lt;/p&gt;

&lt;h2 id=&quot;what-comes-next&quot;&gt;What Comes Next&lt;/h2&gt;

&lt;p&gt;Wave 3 is powerful, but it is not the end of the story. The current generation of AI-native tools excels at certain types of card games, particularly deck builders and simple drafting games, where the card data model maps cleanly to a structured schema. But as we will explore in the next article in this series, more complex games expose fundamental limits in how AI represents cards.&lt;/p&gt;

&lt;p&gt;Engine builders like Wingspan and Terraforming Mars, with their multi-resource costs and cross-card synergy systems, challenge the basic assumptions of the schema. Living card games like Arkham Horror require entirely separate data models for player cards and encounter cards. And games like KeyForge, where a single card has four distinct abilities that fire at different phases of the game, break the schema entirely.&lt;/p&gt;

&lt;p&gt;The tools are evolving to meet these challenges. But the gap between what AI can design today and what the most ambitious designers aspire to create is where the most interesting work is happening.&lt;/p&gt;

&lt;p&gt;I should be clear about what I am not arguing. I am not saying every designer should adopt Wave 3 tools, or that earlier approaches are obsolete. Some designers may look at everything described in this article and decide it is too complicated, or that the hands-on process of manually crafting each card is part of what makes designing fun for them. There are designers who resist spreadsheets, designers who resist GUIs, and there will certainly be designers who resist AI. That is completely valid. The act of hand-cutting prototype cards on a Saturday afternoon has a tactile satisfaction that no pipeline can replicate. Not every efficiency gain is a net gain if it removes the part of the process you actually enjoy.&lt;/p&gt;

&lt;p&gt;My goal with this series is not to convert anyone. It is to show and tell what can be done, so that designers who are curious have a clear picture of the landscape.&lt;/p&gt;

&lt;p&gt;I know a number of card game designers personally. Several of them have game ideas that have been collecting dust for years, stuck somewhere between a napkin sketch and a playable prototype, for exactly the reasons laid out in this article. The spreadsheet got too big. The layout took too long. The balance math was overwhelming. Life got in the way, and the project never made it to a table. The designers who are open-minded about new tooling, especially the ones with enough technical fluency to understand what AI pipelines can and cannot do, are the ones I have seen light up when they realize their shelved idea might actually be buildable now. Bringing a designer’s dormant idea back to life, giving them a path from “I always wanted to make this game” to “here is a playable prototype, let us see if it works,” that is the most valuable thing these new tools can offer. Not replacing the designer’s vision, but removing the obstacles that buried it.&lt;/p&gt;

&lt;p&gt;For now, the practical reality is this: if you have a game idea sketched on a napkin, the barrier between that napkin and a playable prototype has never been lower. Whether the prototype is any good still depends entirely on you. I have spent enough time in both worlds now to know that game design is no less serious, no less complex, and no less creative than software engineering. It is just a different medium. The tools have gotten dramatically better. The hard part has not changed at all.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] nanDECK. &lt;a href=&quot;https://nandeck.com/&quot;&gt;&lt;em&gt;nanDECK Card Generation Tool&lt;/em&gt;&lt;/a&gt;. Free Windows scripting language for card layout and rendering.&lt;/p&gt;

&lt;p&gt;[2] Andy Meneely. &lt;a href=&quot;https://github.com/andymeneely/squib&quot;&gt;&lt;em&gt;Squib: A Ruby DSL for Card Prototyping&lt;/em&gt;&lt;/a&gt;. Open-source framework for programmatic card generation.&lt;/p&gt;

&lt;p&gt;[3] Component.Studio. &lt;a href=&quot;https://component.studio/&quot;&gt;&lt;em&gt;Web-based Card Design with Google Sheets Integration&lt;/em&gt;&lt;/a&gt;. WYSIWYG card design tool with data-driven templates.&lt;/p&gt;

&lt;p&gt;[4] Dextrous. &lt;a href=&quot;https://www.dextrous.com.au/&quot;&gt;&lt;em&gt;Modern WYSIWYG Card Design&lt;/em&gt;&lt;/a&gt;. Visual editor with TTS and print-on-demand export.&lt;/p&gt;

&lt;p&gt;[5] CardPen. &lt;a href=&quot;https://cardpen.mcdemarco.net/&quot;&gt;&lt;em&gt;Browser-based HTML/CSS Card Generator&lt;/em&gt;&lt;/a&gt;. Lightweight web tool for card prototyping.&lt;/p&gt;

&lt;p&gt;[6] Berserk Games. &lt;a href=&quot;https://store.steampowered.com/app/286160/&quot;&gt;&lt;em&gt;Tabletop Simulator&lt;/em&gt;&lt;/a&gt;. 3D physics sandbox for digital board game prototyping.&lt;/p&gt;

&lt;p&gt;[7] The Game Crafter. &lt;a href=&quot;https://www.thegamecrafter.com/&quot;&gt;&lt;em&gt;Print-on-Demand for Tabletop Games&lt;/em&gt;&lt;/a&gt;. Physical production service for indie designers.&lt;/p&gt;

&lt;p&gt;[8] MakePlayingCards. &lt;a href=&quot;https://makeplayingcards.com/&quot;&gt;&lt;em&gt;Volume Card Printing&lt;/em&gt;&lt;/a&gt;. Mid-to-high volume card production.&lt;/p&gt;

&lt;p&gt;[9] Geoffrey Engelstein and Isaac Shalev. &lt;a href=&quot;https://www.routledge.com/Building-Blocks-of-Tabletop-Game-Design-An-Encyclopedia-of-Mechanisms/Engelstein-Shalev/p/book/9781138365490&quot;&gt;&lt;em&gt;Building Blocks of Tabletop Game Design: An Encyclopedia of Mechanisms&lt;/em&gt;&lt;/a&gt;. CRC Press, 2019.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The definitive taxonomy of tabletop game mechanisms, providing the vocabulary that AI ontologies use to classify mechanics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[10] Benny Cheung. &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-1&quot;&gt;&lt;em&gt;Game Architecture for Card Game Model, Action, and AI&lt;/em&gt;&lt;/a&gt; (Parts &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-1&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-2&quot;&gt;2&lt;/a&gt;, &lt;a href=&quot;https://bennycheung.github.io/game-architecture-card-ai-3&quot;&gt;3&lt;/a&gt;). bennycheung.github.io, 2021.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Three-part series reverse-engineering the architecture and neural network AI of Race for the Galaxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[11] Benny Cheung. &lt;a href=&quot;https://bennycheung.github.io/generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;&lt;em&gt;Generative Ontology: From Game Knowledge to Game Creation&lt;/em&gt;&lt;/a&gt;. bennycheung.github.io, 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Deep dive into the ontology-driven approach to generative game design.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[12] Danrui Li, Sen Zhang, Samuel S. Sohn, Kaidong Hu, Muhammad Usman, and Mubbasir Kapadia. &lt;a href=&quot;https://arxiv.org/abs/2502.07128&quot;&gt;&lt;em&gt;Cardiverse: Harnessing LLMs for Novel Card Game Prototyping&lt;/em&gt;&lt;/a&gt;. EMNLP 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Academic research on multi-agent LLM pipelines for card game generation, using graph-based mechanic indexing and self-play validation.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Sat, 21 Mar 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/three-waves-of-card-game-design-tools</link>
        <guid isPermaLink="true">https://bennycheung.github.io/three-waves-of-card-game-design-tools</guid>
        
        <category>Game Design</category>
        
        <category>Card Games</category>
        
        <category>Design Tools</category>
        
        <category>Tabletop Games</category>
        
        <category>Prototyping</category>
        
        <category>Game Architecture</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>AI Playtesting - When Your Board Game Tests Itself</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;A designer types “test my game for balance issues” into Nova. Moments later, they receive a structured critique: which player seat has an unfair advantage, whether the game rewards strategic play, and three intervention options. No prototyping, no recruiting playtesters, no spreadsheets. Just a conversation, and a feedback loop that runs every time you change a number. This is the story of how we taught a system to play board games, what failed spectacularly, and what that failure accidentally invented.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Overview.png&quot; alt=&quot;AI Playtesting: When Your Game Tests Itself&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The automated playtesting pipeline transforms a structured game ontology into automated balance analysis, skill gap measurement, and rule clarity scores, all through a conversation with Nova.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is Part 9 of the Game Architecture series. In &lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Part 5&lt;/a&gt;, we demonstrated structured game generation. In &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;Part 6&lt;/a&gt;, we explored the theory behind generative ontology. In &lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Part 7&lt;/a&gt;, we introduced Nova, the conversational AI co-designer. And in &lt;a href=&quot;generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Part 8&lt;/a&gt;, we showed the full pipeline from knowledge to creation.&lt;/p&gt;

&lt;p&gt;But there was a gap. GameGrammar could generate a structurally valid game in minutes. Nova could help you refine it over sessions. Yet between “a design exists on paper” and “we know if it works at the table” sat the same wall every designer faces: prototype it, recruit friends, schedule sessions, track results by hand, and repeat the whole process after every change.&lt;/p&gt;

&lt;p&gt;This article is about how we tore down that wall.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-wall-where-designs-go-to-die&quot;&gt;The Wall: Where Designs Go to Die&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Stage2_Wall.png&quot; alt=&quot;The Stage 2 Wall&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The board game design pipeline has nine stages. Stage 2 (iterative playtesting) is where most amateur designs stall.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every game designer knows the feeling. You have spent a weekend crafting a deck-building game with a push-your-luck mechanism. The card types feel right. The economy seems balanced. The theme sings. Then reality hits: you need to print cards, recruit four friends who are free on the same evening, explain the rules, play through three sessions, take notes, change the numbers, and do it all again. By the third iteration, your friends are politely unavailable, and the game sits in a drawer.&lt;/p&gt;

&lt;p&gt;The board game design pipeline has a well-known bottleneck, and it is not creativity. The tools for generating ideas, sketching mechanisms, even producing complete game ontologies, have accelerated dramatically. But determining whether a design is balanced and strategically interesting still requires physical prototyping, player recruitment, observation, and post-session analysis. This process spans weeks to months. It is where most amateur designs stall, and even professional studios spend the majority of their development time [1].&lt;/p&gt;

&lt;p&gt;GameGrammar’s ontology pipeline had already automated concept generation, structural analysis, and conversational co-design via Nova. But the ontology output contains everything a simulator would need. Component specifications define the game objects. Mechanism details define the legal actions. Scoring formulas define how you win. Balance parameters define the constraints. Game arc defines the turn structure.&lt;/p&gt;

&lt;p&gt;The data was there. The question was whether it could be made executable. It turns out, it can. Before we explain how, let us show you what it looks like in practice.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;how-designers-use-it-a-conversation-with-nova&quot;&gt;How Designers Use It: A Conversation with Nova&lt;/h2&gt;

&lt;p&gt;The entire playtesting pipeline surfaces through Nova, the conversational co-designer we introduced in &lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Part 7&lt;/a&gt;. The designer never sees parsers, agents, or metrics directly. They see a conversation.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
  &lt;iframe src=&quot;https://www.youtube.com/embed/8KCOMVEytK0&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Video. GameGrammar AI Playtesting: Nova orchestrates the entire playtest pipeline from a natural language request.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-design-loop&quot;&gt;The Design Loop&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;The designer says: “Run a balance playtesting for the game”&lt;/li&gt;
  &lt;li&gt;Nova parses the game rules, simulates 50 games with random agents, and analyzes the results&lt;/li&gt;
  &lt;li&gt;Nova presents a structured critique with a reasoning chain: conclusion (“Love Letter shows a significant first-player advantage”), observation, data, mechanism explanation, and competitive impact&lt;/li&gt;
  &lt;li&gt;Decision levels appear: &lt;strong&gt;Structural (Restructure)&lt;/strong&gt; suggestions like rotating first player, &lt;strong&gt;Tuning&lt;/strong&gt; suggestions like adjusting card values, and &lt;strong&gt;Fork&lt;/strong&gt; to explore alternative designs&lt;/li&gt;
  &lt;li&gt;The designer picks an intervention, Nova proposes the ontology change, and re-runs the playtest to verify the fix&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Nova_Session.png&quot; alt=&quot;Nova Playtesting Session&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Nova presenting playtesting results inside GameGrammar. The critique reasoning chain surfaces balance findings, skill gap measurement, and intervention options through natural conversation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Compare that to the traditional workflow: change a number, reprint the cards, recruit players, schedule an evening, play through, take notes, aggregate results. What used to be a multi-week iteration cycle becomes a continuous feedback loop inside a single conversation.&lt;/p&gt;

&lt;h3 id=&quot;the-playtest-history&quot;&gt;The Playtest History&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_History.png&quot; alt=&quot;Playtest History Tab&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The Playtesting tab shows run history with expandable game logs. Designers can track how balance metrics evolve across design iterations.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every playtest run is saved with its metrics, and designers can track how their balance numbers change across design iterations. Did that scoring tweak fix the first-player advantage? Did adding an extra card to the starting hand reduce stalemates? The history provides a quantitative record of design decisions and their measured impact.&lt;/p&gt;

&lt;p&gt;Now that you have seen what the experience looks like, let us explore how it works under the hood.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-approach-llm-translates-algorithms-play&quot;&gt;The Approach: LLM Translates, Algorithms Play&lt;/h2&gt;

&lt;p&gt;We explored three approaches before arriving at the production system. A quick summary of the two that did not win, and why they still matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A direct simulator&lt;/strong&gt; parsed the ontology into a deterministic game engine. It worked well for card games and found real balance signals on day one, but pattern-matching parsers break on anything beyond simple mechanics. The parser, not the game, becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Letting an LLM play the game&lt;/strong&gt; sounded promising: feed the ontology and game state to an LLM each turn, let it choose an action, no formal rules needed. We built seven player archetypes (aggressive, cautious, engine-builder, newbie, and others). But across every extended experiment, &lt;strong&gt;LLM agents performed worse than random&lt;/strong&gt; (-39% skill gap after 100 games). This corroborates findings from GameBench [2] and GTBench [3]. LLMs cannot maintain consistent strategic reasoning over multiple turns [4]. However, the failure became an innovation: while LLMs cannot play strategically, their patterns of confusion are remarkably consistent. When an LLM systematically avoids a mechanism, that mechanism’s description is probably ambiguous. We had accidentally built a &lt;strong&gt;rule clarity analyzer&lt;/strong&gt;, something no existing game design toolkit offers.&lt;/p&gt;

&lt;h3 id=&quot;the-winning-hybrid&quot;&gt;The Winning Hybrid&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Hybrid_Architecture.png&quot; alt=&quot;The Hybrid Architecture&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The winning approach: use the LLM for what it excels at (translating natural language into formal rules) and use traditional game AI for what it excels at (finding optimal play through search).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The production system uses each AI for its natural strength. The LLM reads your game’s mechanism descriptions and translates them into formal game actions, achieving roughly 90% coverage of Love Letter’s mechanics [5]. A deterministic engine then executes those actions with perfect rule enforcement.&lt;/p&gt;

&lt;p&gt;For strategic play, we use MCTS (Monte Carlo Tree Search), a well-established game AI technique [6] that handles hidden information by sampling what opponents might hold and searching for the best move across those possibilities. On Love Letter, MCTS wins 81% of games against random play, a +62.4% skill gap that holds consistently across repeated runs. It runs entirely on local computation with zero API calls, so a designer can re-run the analysis after every single change.&lt;/p&gt;

&lt;p&gt;The key insight is that each agent type serves a different purpose:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Agent Type&lt;/th&gt;
      &lt;th&gt;Skill Gap&lt;/th&gt;
      &lt;th&gt;What It Measures&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;MCTS&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;+62.4%&lt;/td&gt;
      &lt;td&gt;Whether your game rewards strategic play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;-39%&lt;/td&gt;
      &lt;td&gt;Which rules are confusing (clarity signal)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Random&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;0% (baseline)&lt;/td&gt;
      &lt;td&gt;Balance and statistical fairness&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;A negative skill gap means the LLM loses to random play more often than it wins. In other words, attempting to reason about the game actively hurts performance. The LLM does not fail because it is unintelligent. It fails because it cannot hold consistent game state across turns: it forgets what cards have been played, misapplies rules it understood one turn ago, and second-guesses valid strategies. A random agent, which needs no memory at all, outperforms it simply by avoiding these compounding errors.&lt;/p&gt;

&lt;p&gt;MCTS proves whether the game rewards skill. LLMs reveal which rules are confusing. Each agent type is useless at the other’s job.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-the-system-detects&quot;&gt;What the System Detects&lt;/h2&gt;

&lt;p&gt;The playtesting pipeline produces four categories of analysis. The first three run entirely on local computation after an initial LLM parse (which is cached per design version). The fourth uses LLM agents for rule clarity scoring.&lt;/p&gt;

&lt;h3 id=&quot;balance-metrics-random-agents&quot;&gt;Balance Metrics (Random Agents)&lt;/h3&gt;

&lt;p&gt;Six statistical metrics from 100+ random-agent self-play games:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;What It Catches&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Seat advantage&lt;/td&gt;
      &lt;td&gt;First/last player wins too often&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Strategy diversity&lt;/td&gt;
      &lt;td&gt;One action dominates all others&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Dead actions&lt;/td&gt;
      &lt;td&gt;Game elements nobody uses&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Game length&lt;/td&gt;
      &lt;td&gt;Too short, too long, or too variable&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Elimination timing&lt;/td&gt;
      &lt;td&gt;Players knocked out too early&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Stalemate rate&lt;/td&gt;
      &lt;td&gt;Games that never end&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;skill-gap-mcts-agents&quot;&gt;Skill Gap (MCTS Agents)&lt;/h3&gt;

&lt;p&gt;MCTS agents play half the seats, random agents play the other half. The win rate difference measures whether the game rewards strategic play. Above +50% is strong strategic depth. Below +20% may feel random. Negative means strategic play is counterproductive, a sign that something is broken. Crucially, this analysis runs fast enough that a designer can tweak one number in the ontology and re-check within the same conversation.&lt;/p&gt;

&lt;h3 id=&quot;topology-balance-spatial-games&quot;&gt;Topology Balance (Spatial Games)&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Topology_Balance.png&quot; alt=&quot;Topology-Driven Balance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Three controlled experiments on Catan Simple decompose seat advantage into first-mover and topological components, revealing that board connectivity determines the outcome more than move order.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the finding we are most excited about. For games with structured board topology, the simulator detects balance properties that emerge from the shape of the board itself. We ran three experiments on a simplified Catan (7 hex regions, 2 players, first to control 4 regions wins):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table II: Topology-Driven Balance (Catan Simple, 100 games each)&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Experiment&lt;/th&gt;
      &lt;th&gt;P0 Start (neighbors)&lt;/th&gt;
      &lt;th&gt;P1 Start (neighbors)&lt;/th&gt;
      &lt;th&gt;P0 Win%&lt;/th&gt;
      &lt;th&gt;P1 Win%&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Baseline&lt;/td&gt;
      &lt;td&gt;Forest (3)&lt;/td&gt;
      &lt;td&gt;Quarry (2)&lt;/td&gt;
      &lt;td&gt;76%&lt;/td&gt;
      &lt;td&gt;24%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Swapped starts&lt;/td&gt;
      &lt;td&gt;Quarry (2)&lt;/td&gt;
      &lt;td&gt;Forest (3)&lt;/td&gt;
      &lt;td&gt;45%&lt;/td&gt;
      &lt;td&gt;55%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Symmetric starts&lt;/td&gt;
      &lt;td&gt;Forest (3)&lt;/td&gt;
      &lt;td&gt;Mountain (3)&lt;/td&gt;
      &lt;td&gt;49%&lt;/td&gt;
      &lt;td&gt;51%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The key finding: &lt;strong&gt;the 76% advantage is topological, not temporal.&lt;/strong&gt; When starting positions are swapped, the advantage reverses. When both players start on equal-connectivity regions, the game is nearly balanced (49/51). The board’s graph structure determines the outcome far more than move order.&lt;/p&gt;

&lt;p&gt;This is something a human designer staring at a map would likely miss. You do not naturally count adjacency degrees when looking at a hex grid. The simulator does. And the design intervention is fundamentally different: instead of tweaking turn order or action costs, you fix the board’s connectivity.&lt;/p&gt;

&lt;h3 id=&quot;rule-clarity-llm-agents&quot;&gt;Rule Clarity (LLM Agents)&lt;/h3&gt;

&lt;p&gt;LLM agents play the game using different archetypes (including a “newbie” that deliberately misreads rules), and their confusion patterns produce per-mechanism clarity scores:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table III: Per-Mechanism Clarity (Love Letter)&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Mechanism&lt;/th&gt;
      &lt;th&gt;Score&lt;/th&gt;
      &lt;th&gt;Signal&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Draw a card at start of turn&lt;/td&gt;
      &lt;td&gt;10.0&lt;/td&gt;
      &lt;td&gt;Mandatory, always clear&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Priest: look at opponent’s hand&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
      &lt;td&gt;Clear, chosen often&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Play a card&lt;/td&gt;
      &lt;td&gt;9.6&lt;/td&gt;
      &lt;td&gt;Low uncertainty&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Guard: guess opponent’s card&lt;/td&gt;
      &lt;td&gt;9.0&lt;/td&gt;
      &lt;td&gt;Rarely chosen&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Prince: force discard and redraw&lt;/td&gt;
      &lt;td&gt;8.6&lt;/td&gt;
      &lt;td&gt;Near-zero usage&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Baron: compare hands&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;8.5&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Never chosen across 135 opportunities&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The Baron’s comparison mechanic scores lowest every time. It requires reasoning about relative card values, which is the most complex rule in Love Letter. The LLM systematically avoids it. For a designer, this is specific, actionable feedback: the Baron’s rule needs the most clarification effort in your rulebook.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-works-now-game-tier-coverage&quot;&gt;What Works Now: Game Tier Coverage&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/ai-playtesting-when-your-game-tests-itself/AI_Playtesting_Tier_Coverage.png&quot; alt=&quot;Game Tier Coverage&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The simulator covers Tier 1 through Tier 5, roughly 80-85% of modern board game designs. Each tier adds new mechanism categories.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The simulator handles Tier 1 through Tier 5 games, covering roughly 80-85% of modern board game designs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table IV: Mechanism Tier Coverage&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Tier&lt;/th&gt;
      &lt;th&gt;Complexity&lt;/th&gt;
      &lt;th&gt;Examples&lt;/th&gt;
      &lt;th&gt;Key Mechanics&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;Light card games&lt;/td&gt;
      &lt;td&gt;Love Letter, Coup&lt;/td&gt;
      &lt;td&gt;Draw, play, compare, eliminate&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;Resource/market games&lt;/td&gt;
      &lt;td&gt;Splendor, Star Realms&lt;/td&gt;
      &lt;td&gt;Resource pools, markets, tableaux&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2.5&lt;/td&gt;
      &lt;td&gt;Dice/deck building&lt;/td&gt;
      &lt;td&gt;Farkle, Dominion&lt;/td&gt;
      &lt;td&gt;Dice pools, push-your-luck, deck cycling&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;Worker placement, cooperative&lt;/td&gt;
      &lt;td&gt;Lords of Waterdeep, Pandemic&lt;/td&gt;
      &lt;td&gt;Action spaces, blocking, shared resources, team win/loss&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;Simultaneous action, card drafting&lt;/td&gt;
      &lt;td&gt;Sushi Go&lt;/td&gt;
      &lt;td&gt;Staged selection, draft passing, set collection scoring&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;Spatial, area control&lt;/td&gt;
      &lt;td&gt;Catan&lt;/td&gt;
      &lt;td&gt;Board graph, region ownership, parameterized actions&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;what-does-not-work-yet&quot;&gt;What Does Not Work Yet&lt;/h3&gt;

&lt;p&gt;We believe in being honest about limitations. The following categories remain unsupported, and Nova’s coverage gate communicates this to designers before they try:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Category&lt;/th&gt;
      &lt;th&gt;Examples&lt;/th&gt;
      &lt;th&gt;What Is Missing&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Route building&lt;/td&gt;
      &lt;td&gt;Ticket to Ride&lt;/td&gt;
      &lt;td&gt;Pathfinding, longest-path scoring&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Auction/bidding&lt;/td&gt;
      &lt;td&gt;Power Grid, For Sale&lt;/td&gt;
      &lt;td&gt;Bid state, auction resolution&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Trick-taking&lt;/td&gt;
      &lt;td&gt;The Crew, Hearts&lt;/td&gt;
      &lt;td&gt;Trick structure, follow-suit, trump&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Complex triggers&lt;/td&gt;
      &lt;td&gt;Wingspan, Terraforming Mars&lt;/td&gt;
      &lt;td&gt;Cascading effects, conditional activation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Asymmetric powers&lt;/td&gt;
      &lt;td&gt;Root, Vast&lt;/td&gt;
      &lt;td&gt;Per-player unique actions and win conditions&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;These gaps represent genuine architectural breaks: complex triggers require an event system, asymmetric powers require per-player action sets, and route building requires pathfinding algorithms. Each is a future engineering epic, not a configuration change. The goal is not to simulate every game ever made. It is to cover the games that GameGrammar generates well enough that playtesting becomes a conversation, not a project.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;honest-challenges&quot;&gt;Honest Challenges&lt;/h2&gt;

&lt;p&gt;No system is without limitations, and we think being transparent about ours strengthens rather than weakens the case for automated playtesting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM parsing is non-deterministic.&lt;/strong&gt; Two independent parses of the same game produce slightly different rule interpretations. The fix is pragmatic: parse once, cache the result, and reuse it for all subsequent simulations. Same cache plus same seed equals identical results. But designers should know that the initial parse defines the game the simulator plays, and it may not perfectly match their intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation uses simplified games.&lt;/strong&gt; Our fixtures are simplified versions of published games: Love Letter with 8 card types, Catan with 7 regions, Dominion with a limited card pool. Full-complexity games (Terraforming Mars with 200+ project cards, Gloomhaven with asymmetric character decks) would stress the system in ways we have not yet tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule clarity has not been validated against humans.&lt;/strong&gt; The Baron comparison mechanic consistently scores lowest, which makes intuitive sense because it is objectively the most complex rule in Love Letter. But we have not yet compared our automated clarity scores with actual human confusion ratings. The metric is plausible and useful, but formally unvalidated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The simulator plays the game-as-parsed, not the game-as-intended.&lt;/strong&gt; If the parser misinterprets a mechanism, the balance findings are real findings about the wrong game. The pattern parser’s artificial P3 advantage in Love Letter was exactly this kind of error. The LLM parser’s 90% accuracy is high, but the remaining 10% can produce subtle distortions.&lt;/p&gt;

&lt;p&gt;These are known limitations, not hidden ones. Nova’s coverage gate and the simulator’s deterministic replay make them manageable in practice.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;When we started this research, the question was simple: can AI learn to play a board game by reading its structured design description? The answer turned out to be more nuanced than yes or no.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCTS can play, and play well.&lt;/strong&gt; +62.4% skill gap on Love Letter, pure local computation, zero API calls. Traditional game AI algorithms, when fed a structured game definition parsed from natural language, produce reliable strategic play. The game rewards skill, and MCTS finds the skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLMs cannot play, and that is the point.&lt;/strong&gt; -39% skill gap after 100 games. But the pattern of their confusion measures something no other tool measures: rule clarity. An LLM that systematically avoids a mechanism is telling you that mechanism is hard to understand. This inverts the standard framing where LLM game-playing failure is a deficiency to overcome. In our system, the failure is the feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure reveals what intuition misses.&lt;/strong&gt; A 76% win rate from a 3-neighbor region versus a 2-neighbor region. No human designer spots that by staring at a map. The simulator does. And the design intervention (fix the board connectivity) is fundamentally different from what the designer would have tried (tweak the card costs).&lt;/p&gt;

&lt;p&gt;The goal was never to replace human playtesters. Human playtesting reveals social dynamics, emotional arcs, and “feel” that no simulator captures. The goal was to compress the iteration cycle. Instead of weeks between design changes and feedback, the loop now fits inside a single conversation with Nova. Change a parameter, re-run the analysis, see the impact, iterate again. For Tier 1-5 games, that continuous feedback loop transforms playtesting from a project into a conversation.&lt;/p&gt;

&lt;p&gt;If you would like to try automated playtesting on your own designs, &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;GameGrammar&lt;/a&gt; is in public beta with a free tier that includes balance analysis via Nova. Already have a game? You do not need to generate from scratch — describe your existing design to Nova and run balance analysis on the game you built.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;series-navigation&quot;&gt;Series Navigation&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Unlocking the Secrets of Tabletop Games Ontology (Part 4)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Introducing GameGrammar: AI-Powered Board Game Design (Part 5)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;GameGrammar: The Theory of Generative Board Game Design (Part 6)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Nova: The AI Co-Designer That Learns Your Taste (Part 7)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Generative Ontology: From Game Knowledge to Game Creation (Part 8)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;»&lt;/strong&gt; &lt;a href=&quot;ai-playtesting-when-your-game-tests-itself&quot;&gt;AI Playtesting: When Your Game Tests Itself (Part 9)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Geoffrey Engelstein and Isaac Shalev. &lt;a href=&quot;https://www.routledge.com/Building-Blocks-of-Tabletop-Game-Design/Engelstein-Shalev/p/book/9781138365490&quot;&gt;&lt;em&gt;Building Blocks of Tabletop Game Design: An Encyclopedia of Mechanisms&lt;/em&gt;&lt;/a&gt;. CRC Press, 2020.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comprehensive mechanism taxonomy and the iterative playtesting challenge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] D. Costarelli et al. &lt;a href=&quot;https://arxiv.org/abs/2406.06613&quot;&gt;&lt;em&gt;GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents&lt;/em&gt;&lt;/a&gt;. arXiv:2406.06613, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Documents systematic LLM failures including state-tracking loss and rule hallucination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Z. Duan et al. &lt;a href=&quot;https://arxiv.org/abs/2402.12348&quot;&gt;&lt;em&gt;GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations&lt;/em&gt;&lt;/a&gt;. NeurIPS 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Finds LLMs with Chain-of-Thought universally fail against MCTS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] Y. Hu et al. &lt;a href=&quot;https://arxiv.org/abs/2402.18659&quot;&gt;&lt;em&gt;Large Language Models and Games: A Survey and Roadmap&lt;/em&gt;&lt;/a&gt;. arXiv:2402.18659, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comprehensive survey including the After-State Text Protocol pattern for LLM game interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] C. Becker et al. &lt;a href=&quot;https://arxiv.org/abs/2508.16447&quot;&gt;&lt;em&gt;Boardwalk: Towards a Framework for Creating Board Games with LLMs&lt;/em&gt;&lt;/a&gt;. SBGames 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Board game code generation from LLMs, achieving 55.6% error-free rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[6] R. Coulom. &lt;a href=&quot;https://link.springer.com/chapter/10.1007/978-3-540-75538-8_7&quot;&gt;&lt;em&gt;Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search&lt;/em&gt;&lt;/a&gt;. CG 2006, LNCS 4630, Springer, 2007.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Introduced UCT selection, the foundation of our MCTS agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[7] E. Piette et al. &lt;a href=&quot;https://arxiv.org/abs/1905.05013&quot;&gt;&lt;em&gt;Ludii: The Ludemic General Game System&lt;/em&gt;&lt;/a&gt;. arXiv:1905.05013, 2020.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;General game system requiring formal game descriptions (our system accepts natural language)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[8] Benny Cheung. &lt;a href=&quot;https://arxiv.org/abs/2602.05636&quot;&gt;&lt;em&gt;Generative Ontology: When Structured Knowledge Learns to Create&lt;/em&gt;&lt;/a&gt;. arXiv:2602.05636, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Formal paper on the Generative Ontology framework with ablation study&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[9] &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;&lt;em&gt;GameGrammar&lt;/em&gt;&lt;/a&gt;. Dynamind Research, 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Board game design platform with integrated automated playtesting&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 16 Mar 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/ai-playtesting-when-your-game-tests-itself</link>
        <guid isPermaLink="true">https://bennycheung.github.io/ai-playtesting-when-your-game-tests-itself</guid>
        
        <category>Automated Testing</category>
        
        <category>Game Design</category>
        
        <category>Tabletop Games</category>
        
        <category>Playtesting</category>
        
        <category>Monte Carlo Tree Search</category>
        
        <category>Game Architecture</category>
        
        <category>Board Games</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Generative Ontology: From Game Knowledge to Game Creation</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;In February 2025, we explored how ontologies reveal the hidden structure of tabletop games. But understanding games is not the same as creating them. What if that same structured knowledge could become a creative engine? This is the promise of Generative Ontology, when knowledge representation learns to imagine.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_01.png&quot; alt=&quot;Generative Ontology - Grammar of Creation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Structure meets Imagination, the duality at the heart of Generative Ontology.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This article is the conclusion of the Game Architecture series. In &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Part 4&lt;/a&gt; [8], we built an ontology for tabletop games, decomposing CATAN into mechanisms (resource trading, modular board, dice-driven production), components (hex tiles, resource cards, settlements), and player dynamics (competitive, negotiation-heavy, variable player count). The ontology gave us a vocabulary for understanding games, a precise language for analysis. In &lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Part 5&lt;/a&gt;, we demonstrated how that ontology powers a multi-agent generation pipeline. In &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;Part 6&lt;/a&gt;, we explored the theory behind structured creative generation. And in &lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Part 7&lt;/a&gt;, we showed how a conversational AI partner can learn a designer’s taste.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Board_Game_Ontology_Examples.jpg&quot; alt=&quot;Board Game Ontology Examples&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Games like CATAN and Dune: Imperium share a common ontological structure beneath their vastly different themes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, in this final article, we tackle the question that analysis alone cannot answer: can the same ontology that helps us &lt;em&gt;understand&lt;/em&gt; CATAN help us &lt;em&gt;create&lt;/em&gt; games that CATAN’s designers never imagined?&lt;/p&gt;

&lt;p&gt;We call this synthesis &lt;strong&gt;Generative Ontology&lt;/strong&gt;: the practice of encoding domain knowledge as executable schemas that constrain and guide AI generation, transforming static knowledge representation into a creative engine. This article presents the theoretical framework, walks through a complete game generation from theme to playable design, and provides the experimental evidence that it works.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;from-description-to-creation&quot;&gt;From Description to Creation&lt;/h2&gt;

&lt;p&gt;Our game ontology [4] can tell us that worker placement games typically include action spaces, worker tokens, and blocking mechanisms [3]. It cannot generate a novel worker placement game. Large language models have the opposite problem [6]. Ask an LLM to “design a deck-building game set in a haunted mansion,” and it will fluently describe players exploring Ravenshollow Manor, collecting ghost cards, managing a “fear mechanic.” It sounds plausible. But what cards exist in the starting deck? How do players acquire new cards? What triggers the end of the game? The LLM has generated the &lt;em&gt;appearance&lt;/em&gt; of a game design without the &lt;em&gt;substance&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_02.png&quot; alt=&quot;The Paradox of Creation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Traditional Ontology (The Map) vs Pure LLMs (The Dreamer), understanding the rules of chess does not make you a Grandmaster.&lt;/em&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Approach&lt;/th&gt;
      &lt;th&gt;Strength&lt;/th&gt;
      &lt;th&gt;Weakness&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Traditional Ontology&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Precise, structured, validated&lt;/td&gt;
      &lt;td&gt;Cannot generate novel outputs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Pure LLM Generation&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Creative, fluent, abundant&lt;/td&gt;
      &lt;td&gt;Unstructured, invalid, hallucinated&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;These limitations are complementary [5]. What ontology lacks, LLMs provide. What LLMs lack, ontology provides.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_03.png&quot; alt=&quot;Defining Generative Ontology&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. LLM Potential + Ontology Constraints = Valid Game Design, from passive vocabulary to active grammar.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-grammar-of-games&quot;&gt;The Grammar of Games&lt;/h3&gt;

&lt;p&gt;A poet does not experience grammar as a limitation. Grammar is not what prevents poetry. It is what makes poetry &lt;em&gt;possible&lt;/em&gt;. Without syntax, semantics, and form, there would be no sonnets, no haiku, no free verse pushing against convention.&lt;/p&gt;

&lt;p&gt;The same principle applies to game design. When we encode our game ontology as a schema, we are not limiting the AI’s creativity. We are giving it the structural vocabulary to be creative &lt;em&gt;coherently&lt;/em&gt;. The schema says: every game must have a goal, an end condition, mechanisms that create player choices, components that instantiate those mechanisms. Within those constraints, infinite games are possible. Without them, no valid game emerges.&lt;/p&gt;

&lt;p&gt;The grammar does not write the poem. But without grammar, there is no poem to write.&lt;/p&gt;

&lt;h3 id=&quot;the-whiteheadian-connection&quot;&gt;The Whiteheadian Connection&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_04.png&quot; alt=&quot;The Grammar of Games&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Eternal Objects (The Ontology) crystallize into Actual Occasions (The Generation), Whitehead’s process philosophy made computational.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;Part 6&lt;/a&gt; and our earlier exploration of &lt;a href=&quot;process-philosophy-for-ai-agent-design&quot;&gt;Process Philosophy for AI Agent Design&lt;/a&gt; [9], we connected Whitehead’s metaphysics to structured generation. Whitehead distinguished between &lt;strong&gt;eternal objects&lt;/strong&gt; (pure forms existing as potentials) and &lt;strong&gt;actual occasions&lt;/strong&gt; (concrete events where forms find expression) [1]. Our game ontology is a collection of eternal objects: the abstract patterns of worker placement, deck building, area control.&lt;/p&gt;

&lt;p&gt;What makes this precise is Whitehead’s concept of &lt;strong&gt;concrescence&lt;/strong&gt;: the process by which an actual occasion selects from available eternal objects and synthesizes them into a novel unity [2]. This is exactly what the generation pipeline does. The ontology presents the full space of available patterns. The LLM, constrained by the schema, performs concrescence: selecting from those patterns, combining them with theme, and producing a concrete game that has never existed before. The creativity is real, but it is &lt;em&gt;structured&lt;/em&gt; creativity.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;from-ontology-classes-to-generation-schemas&quot;&gt;From Ontology Classes to Generation Schemas&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_05.png&quot; alt=&quot;Ontology as Executable Schema&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The schema forces the LLM to output valid structured data matching ontological requirements, acting as self-documentation for the model.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Philosophy illuminates the path; engineering builds the road. The four ontology concepts from &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Part 4&lt;/a&gt; [8] (Game, Mechanism, Component, Player)map naturally to typed schema definitions:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Ontology Concept&lt;/th&gt;
      &lt;th&gt;Schema Role&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Game Types&lt;/td&gt;
      &lt;td&gt;Constrained enumeration&lt;/td&gt;
      &lt;td&gt;Limits to valid game modes (cooperative, competitive, semi-cooperative)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mechanisms&lt;/td&gt;
      &lt;td&gt;Typed list from taxonomy&lt;/td&gt;
      &lt;td&gt;Ensures only recognized mechanics are referenced&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Components&lt;/td&gt;
      &lt;td&gt;Structured nested object&lt;/td&gt;
      &lt;td&gt;Specifies physical game elements with required fields&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Goal and End Condition&lt;/td&gt;
      &lt;td&gt;Required string fields with minimum length&lt;/td&gt;
      &lt;td&gt;Guarantees playability criteria are never left vague&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The schema flips the ontology from a tool for &lt;em&gt;analysis&lt;/em&gt; (decomposing CATAN into its parts) into a tool for &lt;em&gt;synthesis&lt;/em&gt;. It does not tell the LLM &lt;em&gt;what&lt;/em&gt; game to create. It tells the LLM &lt;em&gt;what a game must be&lt;/em&gt; to count as valid. But a schema alone is not enough. A single LLM call must simultaneously consider mechanisms, theme, components, balance, and player experience. Human design teams do not work this way.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;specialized-agents-for-each-ontology-domain&quot;&gt;Specialized Agents for Each Ontology Domain&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_08.png&quot; alt=&quot;Specialized Agents and Their Anxieties&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. We split the task to create creative tension, preventing shallow, agreeable outputs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A game designer sketches mechanics while an artist develops visual language while a playtester identifies broken interactions. Each specialist brings focused expertise.&lt;/p&gt;

&lt;p&gt;Generative Ontology enables the same division of labor. As we demonstrated in &lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Part 5&lt;/a&gt;, we decompose the ontology into domains and assign specialized agents to each. The result benefits from focused attention at every layer.&lt;/p&gt;

&lt;h3 id=&quot;the-agent-roster&quot;&gt;The Agent Roster&lt;/h3&gt;

&lt;p&gt;Our ontology naturally suggests specialization boundaries:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Agent&lt;/th&gt;
      &lt;th&gt;Ontology Domain&lt;/th&gt;
      &lt;th&gt;Expertise&lt;/th&gt;
      &lt;th&gt;Anxiety&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Mechanics Architect&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Mechanisms&lt;/td&gt;
      &lt;td&gt;Turn structure, action economy, resolution systems&lt;/td&gt;
      &lt;td&gt;“Is there meaningful player agency?”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Theme Weaver&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Narrative&lt;/td&gt;
      &lt;td&gt;Setting, flavor, thematic integration&lt;/td&gt;
      &lt;td&gt;“Does the theme feel alive in every mechanism?”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Component Designer&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Components&lt;/td&gt;
      &lt;td&gt;Cards, tokens, board layout, physical affordances&lt;/td&gt;
      &lt;td&gt;“Can players physically manipulate this smoothly?”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Balance Critic&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Cross-domain&lt;/td&gt;
      &lt;td&gt;Interaction analysis, dominant strategy detection&lt;/td&gt;
      &lt;td&gt;“What breaks? What is unfun when optimized?”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Fun Factor Judge&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Player experience&lt;/td&gt;
      &lt;td&gt;Engagement loops, tension, satisfaction&lt;/td&gt;
      &lt;td&gt;“Would I want to play this again?”&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The “Anxiety” column is the key design innovation. Each agent carries a &lt;em&gt;professional worry&lt;/em&gt; that shapes its generation and critique, preventing the “yes-man” tendency of LLMs to produce plausible but shallow output. The Mechanics Architect wants elegant systems; the Fun Factor Judge wants excitement. This built-in tension mirrors real design team dynamics, with information flowing through typed schemas as explicit handoffs.&lt;/p&gt;

&lt;p&gt;But collaboration alone does not guarantee correctness. Agents can still agree on outputs that look valid but are not. We need a final layer of assurance.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;validation-and-refinement-ontology-as-contract&quot;&gt;Validation and Refinement: Ontology as Contract&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_07.png&quot; alt=&quot;Ontology as Reward Function&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The system retries with specific error messages until structural coherence is achieved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Generation is not enough. LLMs are notoriously agreeable. They will produce output that &lt;em&gt;looks&lt;/em&gt; valid without ensuring it &lt;em&gt;is&lt;/em&gt; valid. Schema validation catches type errors and missing fields, but semantic coherence requires deeper checks. An LLM might declare “deck building” as a mechanism but include no cards in the components. It &lt;em&gt;sounds&lt;/em&gt; valid but is structurally incoherent.&lt;/p&gt;

&lt;h3 id=&quot;ontological-constraint-checking&quot;&gt;Ontological Constraint Checking&lt;/h3&gt;

&lt;p&gt;We encode cross-field consistency rules that go beyond schema validation:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Mechanism-Component Coherence&lt;/strong&gt;: Deck building requires cards. Area control requires a board. Worker placement requires worker tokens. If a mechanism is declared, the corresponding components must exist.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Game Type Consistency&lt;/strong&gt;: Cooperative games cannot have direct conflict as their primary interaction mode. Competitive games should not declare cooperative interaction.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Playability Requirements&lt;/strong&gt;: Goals must be specific (beyond “win” or “score points”). End conditions, turn structure, and uncertainty sources must all be defined.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-refinement-loop&quot;&gt;The Refinement Loop&lt;/h3&gt;

&lt;p&gt;When validation fails, the system retries with specific constraint violations as feedback: “deck-building mechanism declared but no cards in components.” This generate-validate-refine loop continues until the design passes all ontological constraints or a maximum number of attempts is reached.&lt;/p&gt;

&lt;p&gt;The result is that the ontology functions as a &lt;em&gt;contract&lt;/em&gt;. Downstream consumers (human designers, balancing tools, or game engines) can rely on guaranteed structural coherence beyond syntactic validity.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;case-study-a-game-of-racing-to-agi-artificial-general-intelligence&quot;&gt;Case Study: A Game of Racing to AGI (Artificial General Intelligence)&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_10.png&quot; alt=&quot;Case Study: The Input&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A timely, high-stakes theme with no direct tabletop analog, can the ontology handle the nuance of competitive strategy with hidden information?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;How about a game of AGI? Theory is persuasive; demonstration is convincing. Let us trace a complete generation through our Generative Ontology pipeline, from initial theme to playable game design.&lt;/p&gt;

&lt;h3 id=&quot;the-input&quot;&gt;The Input&lt;/h3&gt;

&lt;p&gt;We provide a theme and constraints:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Theme&lt;/strong&gt;: “Rival AI laboratories racing to develop Artificial General Intelligence”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Constraints&lt;/strong&gt;: 2-4 players, medium complexity, competitive, 45-60 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This theme was chosen deliberately: it is timely enough to resonate but has no direct tabletop analog, forcing the system to synthesize rather than imitate.&lt;/p&gt;

&lt;h3 id=&quot;the-design-conversation&quot;&gt;The Design Conversation&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Generative-Ontology-Grammar_of_Creation_11.png&quot; alt=&quot;The Design Conversation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Step-by-step: Architect to Weaver to Critic to Refiner, system self-correction through specialized agents.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Mechanics Architect&lt;/strong&gt; analyzes the theme and identifies that rival AI laboratories suggest parallel development, resource competition, and technological breakthroughs. “Racing to AGI” implies a progress track with a finish line. Given the competitive constraint, players race independently toward a shared victory threshold. It selects action point allocation (secret departmental budgeting), engine building (infrastructure and research synergies), market/auction (competitive talent acquisition), and hidden information (unpublished breakthroughs) as core mechanisms. The turn structure combines simultaneous secret planning with sequential resolution.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Theme Weaver&lt;/strong&gt; receives these mechanics and integrates narrative. Action point allocation becomes lab budget decisions across Research, Talent, Infrastructure, and Intelligence departments. Hidden information becomes proprietary research breakthroughs held secret until strategically published. The market becomes a talent war where labs bid for top AI researchers.&lt;/p&gt;

&lt;p&gt;Engine building becomes the pursuit of synergistic breakthroughs across five research fields: Neural Networks, Robotics, Quantum Computing, Ethics &amp;amp; Safety, and Hardware.&lt;/p&gt;

&lt;p&gt;A 24-card Event deck ensures no two games play out the same, featuring Regulations, Crises, and Breakthroughs that affect all players simultaneously.&lt;/p&gt;

&lt;p&gt;The title emerges: &lt;strong&gt;Neural Race&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Component Designer&lt;/strong&gt; then translates these mechanics into physical form (boards, cards, tokens, and mats), ensuring every mechanism has a tangible instantiation (detailed in the Final Design table below).&lt;/p&gt;

&lt;h3 id=&quot;the-critics-eye&quot;&gt;The Critic’s Eye&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Balance Critic&lt;/strong&gt; identifies three issues. First, runaway leader via infrastructure: flat upgrade costs allow leading players to rapidly scale Action Points, creating an insurmountable advantage. Recommendation: cap maximum Action Points at 10 or implement exponentially increasing upgrade costs (3, 5, 8 points). Second, a “rich-get-richer” dynamic in the talent market: reputation bonuses in bidding allow leading labs to acquire the best researchers, cementing their lead. Recommendation: grant bidding bonuses to trailing players or introduce researchers specifically valuable to weaker positions. Third, surprise scoring swings from holding and mass-publishing synergistic breakthroughs. Recommendation: implement stricter hand size limits or mechanics that force partial information reveals.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Neural_Race_Assessment.png&quot; alt=&quot;Neural Race Design Assessment&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The design assessment with an active roadmap of prioritized fixes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The refinement agent addresses the high-priority reputation issue with a +2 hard cap on bidding bonuses and reputation decay mechanics. The moderate-priority items (infrastructure scaling and hidden information asymmetry) receive targeted fixes: reducing max action points to 8, and adding player reference cards showing all synergy trees. The &lt;strong&gt;Fun Factor Judge&lt;/strong&gt; evaluates the refined design at 8/10. Key engagement hooks: the race dynamic as players approach the AGI victory threshold, the uncertainty of opponents’ hidden breakthrough cards, and the risk/reward calculation of when to publish.&lt;/p&gt;

&lt;h3 id=&quot;the-final-design&quot;&gt;The Final Design&lt;/h3&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
  &lt;iframe src=&quot;https://www.youtube.com/embed/efG_E5v7HCc&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Figure. Neural Race at a glance: five phases of gameplay from secret planning through progress evaluation, with synergy bonuses rewarding deep research investment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The complete output:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Field&lt;/th&gt;
      &lt;th&gt;Value&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Title&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Neural Race&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Theme&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Rival AI laboratories racing to develop AGI in a high-stakes global competition&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Game Type&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Competitive&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Goal&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Advance on the AGI Progress Track by publishing research breakthroughs; first to 20 progress points wins&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;End Condition&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Sprint: First player to 20+ AGI progress triggers immediate victory. Endurance: If no player reaches threshold, highest total of AGI progress + synergy bonuses + reputation wins.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Mechanisms&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Action Point Allocation, Engine Building, Market/Auction, Hidden Information&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Turn Structure&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;1) Secret allocation of Action Points to departments, 2) Research execution and breakthrough draws, 3) Talent market bidding, 4) External event, 5) Progress evaluation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Player Count&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;2-4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Interactions&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Competitive bidding, hidden breakthroughs, reputation race, variable lab specializations&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Core Loop&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Plan (allocate), Research (draw), Market (bid), Event (adapt), Evaluate (score), Repeat&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Every mechanism maps to a thematic action. Every component serves a mechanical purpose. The full component specification, from the 60-card Research Breakthrough deck to the custom wooden robot meeples, is detailed in the &lt;a href=&quot;https://gamegrammar.dynamindresearch.com/s/JHEYCWpyE6DvbCnBOVRPr0iuyroWIdZrO1qjvw3EECs&quot;&gt;interactive ontology on GameGrammar&lt;/a&gt; [11].&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Neural_Race_Game_Board.png&quot; alt=&quot;Neural Race: Imagined Game Board&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. An imagined physical realization of Neural Race, where the AGI Progress Track spirals toward the center, flanked by color-coded research decks, researcher specialists, and custom robot meeples.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;what-does-it-feel-like-to-play&quot;&gt;What Does It Feel Like to Play?&lt;/h3&gt;

&lt;p&gt;Numbers and tables describe the design. But does it feel like a game? Consider Turn 5: Sarah trails Tom 12 to 14 on the AGI track. Tom has been aggressively acquiring talent. Sarah allocates 4 AP to Research and draws Recursive Neural Architecture Search. Combined with her two hidden cards, she now holds a devastating 9 AGI Progress combo. The Talent Market reveals Dr. Martinez, a crucial quantum specialist. Sarah bids 4 (3 AP + 1 Reputation). Tom shocks the table by bidding 6 Talent Points, desperate to block her quantum advantage. Sarah is forced to pivot to Dr. Liu instead.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/generative-ontology-from-game-knowledge-to-game-creation/Neural_Race_Sarah_Combo.png&quot; alt=&quot;Neural Race: Turn 5: The Race Narrows&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A concrete play moment showing the interplay between planning, card draws, and market competition. Every card cost, synergy reference, and bidding value was validated by the ontology before generation completed.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Despite losing the market clash, Sarah’s hidden combo remains intact. This moment captures the publish-or-hoard dilemma at the heart of Neural Race: hold your breakthroughs to build a massive chain, or publish early to bank points and reputation before an opponent disrupts your plan. The ontology guaranteed that the synergy references are valid, that the costs balance against the payoff, and that the card interactions are internally consistent. What it cannot guarantee is whether that moment feels triumphant or cheap. That judgment belongs to the human designer.&lt;/p&gt;

&lt;h3 id=&quot;what-generative-ontology-provided&quot;&gt;What Generative Ontology Provided&lt;/h3&gt;

&lt;p&gt;Without the ontology schema, an LLM generating “a competitive game about rival AI labs” would likely produce vague victory conditions (“be the first to develop AGI somehow”), inconsistent mechanisms (mentioning an auction but specifying no bidding currency), missing components (no actual game pieces defined), and disconnected theme (mechanics unrelated to research or competition).&lt;/p&gt;

&lt;p&gt;The Generative Ontology framework ensured:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Requirement&lt;/th&gt;
      &lt;th&gt;How It Was Enforced&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Complete goal specification&lt;/td&gt;
      &lt;td&gt;Minimum length constraint on goal field&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Coherent mechanism-component alignment&lt;/td&gt;
      &lt;td&gt;Validation function checked that declared mechanisms have matching components&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Thematic integration&lt;/td&gt;
      &lt;td&gt;Theme Weaver agent explicitly connected every mechanism to the narrative&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Playability basics&lt;/td&gt;
      &lt;td&gt;Required fields for turn structure, uncertainty source, end condition&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Balance review&lt;/td&gt;
      &lt;td&gt;Balance Critic agent with “break this” professional anxiety&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The output is &lt;em&gt;playable&lt;/em&gt;. A designer could take this output, build a prototype, and begin playtesting. The ontology grammar guaranteed that all the essential elements of a game are present and coherent.&lt;/p&gt;

&lt;p&gt;But a single compelling example does not prove that the framework works in general. Does Generative Ontology reliably produce better designs than unconstrained LLM generation?&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;does-it-work-the-evidence&quot;&gt;Does It Work? The Evidence&lt;/h2&gt;

&lt;p&gt;In our formal study [12], we conducted three experiments to measure whether Generative Ontology reliably improves AI-generated game designs.&lt;/p&gt;

&lt;h3 id=&quot;study-1-ablation-what-does-each-layer-contribute&quot;&gt;Study 1: Ablation. What Does Each Layer Contribute?&lt;/h3&gt;

&lt;p&gt;We generated 120 game designs across four conditions, progressively adding layers of the Generative Ontology framework:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Condition&lt;/th&gt;
      &lt;th&gt;Configuration&lt;/th&gt;
      &lt;th&gt;Structural Errors&lt;/th&gt;
      &lt;th&gt;Creative Quality&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;C1&lt;/strong&gt; Baseline&lt;/td&gt;
      &lt;td&gt;Raw LLM, no schema&lt;/td&gt;
      &lt;td&gt;5.03 errors/design&lt;/td&gt;
      &lt;td&gt;Low (fun: baseline)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;C2&lt;/strong&gt; Schema&lt;/td&gt;
      &lt;td&gt;Pydantic validation only&lt;/td&gt;
      &lt;td&gt;0.10 errors/design&lt;/td&gt;
      &lt;td&gt;Low&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;C3&lt;/strong&gt; Ontology&lt;/td&gt;
      &lt;td&gt;Schema + ontology, single agent&lt;/td&gt;
      &lt;td&gt;0.00 errors&lt;/td&gt;
      &lt;td&gt;Moderate&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;C4&lt;/strong&gt; Pipeline&lt;/td&gt;
      &lt;td&gt;Full: schema + ontology + multi-agent&lt;/td&gt;
      &lt;td&gt;0.00 errors&lt;/td&gt;
      &lt;td&gt;High&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The results are stark. Schema validation alone eliminates nearly all structural errors (Cohen’s &lt;em&gt;d&lt;/em&gt; = 4.78). But structural validity does not guarantee creative quality. The leap from C3 to C4, adding the multi-agent pipeline with specialized anxieties, is where creative quality emerges:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Fun rating&lt;/strong&gt;: &lt;em&gt;d&lt;/em&gt; = 1.12 (p &amp;lt; .001)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Strategic depth&lt;/strong&gt;: &lt;em&gt;d&lt;/em&gt; = 1.59 (p &amp;lt; .001)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Elegance&lt;/strong&gt;: &lt;em&gt;d&lt;/em&gt; = 1.14 (p &amp;lt; .001)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tension and drama&lt;/strong&gt;: &lt;em&gt;d&lt;/em&gt; = 0.79 (p &amp;lt; .05)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In plain terms: the ontology provides structural validity; the multi-agent pipeline provides creative quality. Neither alone suffices. The framework needs both layers.&lt;/p&gt;

&lt;h3 id=&quot;study-2-benchmark-how-close-to-published-games&quot;&gt;Study 2: Benchmark. How Close to Published Games?&lt;/h3&gt;

&lt;p&gt;We then compared 30 pipeline-generated designs against 20 published board games (CATAN, Dune: Imperium, Wingspan, and others), evaluated across the same creative dimensions:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Dimension&lt;/th&gt;
      &lt;th&gt;Published Games&lt;/th&gt;
      &lt;th&gt;Generated Designs&lt;/th&gt;
      &lt;th&gt;Gap&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Fun Rating&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;8.1&lt;/td&gt;
      &lt;td&gt;Moderate&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Strategic Depth&lt;/td&gt;
      &lt;td&gt;8.9&lt;/td&gt;
      &lt;td&gt;8.1&lt;/td&gt;
      &lt;td&gt;Moderate&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Tension &amp;amp; Drama&lt;/td&gt;
      &lt;td&gt;8.5&lt;/td&gt;
      &lt;td&gt;8.2&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Near parity&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Social Interaction&lt;/td&gt;
      &lt;td&gt;7.2&lt;/td&gt;
      &lt;td&gt;6.9&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Near parity&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Elegance&lt;/td&gt;
      &lt;td&gt;9.3&lt;/td&gt;
      &lt;td&gt;8.0&lt;/td&gt;
      &lt;td&gt;Notable&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Replayability&lt;/td&gt;
      &lt;td&gt;9.1&lt;/td&gt;
      &lt;td&gt;7.6&lt;/td&gt;
      &lt;td&gt;Notable&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Generated designs consistently score in the 7-8 range (good, playable first drafts) while published games score 8-9 (polished, playtested products). The gap is real but expected: published games have undergone months or years of human iteration. What matters is that the generated designs achieve near-parity on tension/drama and social interaction, the experiential qualities that make games &lt;em&gt;feel&lt;/em&gt; engaging.&lt;/p&gt;

&lt;p&gt;One surprising finding: generated designs had &lt;em&gt;fewer&lt;/em&gt; structural consistency errors (1.27) than published games (2.80, &lt;em&gt;d&lt;/em&gt; = 0.76). The ontology enforces a level of internal coherence that even professional designers sometimes overlook.&lt;/p&gt;

&lt;h3 id=&quot;the-constraint-paradox&quot;&gt;The Constraint Paradox&lt;/h3&gt;

&lt;p&gt;The ablation results reveal something counterintuitive. Adding constraints alone (C2, C3) eliminates structural errors without improving creative quality. Constraints actually &lt;em&gt;suppress&lt;/em&gt; richness when applied to a single agent. But the combination of constraints with architectural specialization (C4) produces the largest creative gains. This is what we call the &lt;strong&gt;Constraint-Architecture Interaction Model&lt;/strong&gt;: creative quality is a function of constraint expressiveness &lt;em&gt;multiplied by&lt;/em&gt; architectural specialization. Constraints set the floor. Specialization raises the ceiling. Neither alone suffices.&lt;/p&gt;

&lt;h3 id=&quot;the-gap-as-feature&quot;&gt;The Gap as Feature&lt;/h3&gt;

&lt;p&gt;The one-point creative gap between generated and published designs is, paradoxically, encouraging. Published games represent years of iterative playtesting, community feedback, and designer refinement. No single generation pass can replicate that process. That the pipeline produces designs in the 7-8 range (“good, playable, interesting first drafts”) rather than the 8-9 range (“polished, elegant, replayable”) suggests the gap reflects not a fundamental limitation of the approach but the absence of iterative refinement. The dimensions where parity is already achieved (tension/drama and social interaction) may be those most amenable to specification-time design. The gap dimensions (elegance and replayability) likely require iterative playtesting to optimize.&lt;/p&gt;

&lt;p&gt;A separate test-retest reliability study (ICC analysis across 50 evaluations) validated the LLM-based evaluator itself, with 7 of 9 creative metrics achieving Good-to-Excellent reliability (ICC 0.836-0.989). The evaluation method is trustworthy, which means the gap measurement is trustworthy.&lt;/p&gt;

&lt;h3 id=&quot;what-the-evidence-tells-us&quot;&gt;What the Evidence Tells Us&lt;/h3&gt;

&lt;p&gt;The experiments confirm the core thesis: &lt;strong&gt;structure enables creativity&lt;/strong&gt;. Raw LLMs produce fluent but structurally broken output. Schema validation fixes structure but not quality. The full Generative Ontology pipeline, with ontology constraints, multi-agent specialization, and validation loops, produces designs that are both structurally sound and creatively engaging. The formal treatment with full statistical analysis is available in our paper [12].&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;beyond-games-generative-ontology-as-a-general-framework&quot;&gt;Beyond Games: Generative Ontology as a General Framework&lt;/h2&gt;

&lt;p&gt;The tabletop game domain was our proving ground, but the pattern is not specific to games. Generative Ontology applies wherever three conditions hold:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;The domain has established structure.&lt;/strong&gt; There exists a vocabulary of types, relationships, and constraints that experts use to reason about the domain.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Valid outputs must satisfy cross-field constraints.&lt;/strong&gt; Correctness requires relationships between fields to be coherent, beyond individual fields being well-formed.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Creative generation within structure has value.&lt;/strong&gt; The goal is production of novel outputs that conform to domain rules.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Medical summaries require that diagnoses align with reported symptoms and prescribed treatments [5]. Legal contracts require that defined terms appear in operative clauses. Software architectures require that declared interfaces match their implementations. Recipe generation requires that ingredient quantities yield a dish that actually works.&lt;/p&gt;

&lt;p&gt;In each case, an unconstrained LLM produces fluent output that may violate domain constraints. And in each case, an ontology, encoded as executable schema, can provide the grammar that makes valid generation possible. The multi-agent pattern extends naturally: a medical ontology might decompose into diagnostic, treatment, and contraindication agents with their own professional anxieties.&lt;/p&gt;

&lt;p&gt;The insight is general: &lt;strong&gt;domain ontologies are grammars for creation.&lt;/strong&gt; Any domain with sufficient structure can be made generative.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;Across this series, we have traced an arc: from analyzing games (Part 4) to generating them (Part 5) to understanding why structured generation works (Part 6) to collaborating with designers in real time (Part 7). This final article has provided the theoretical synthesis and the empirical evidence.&lt;/p&gt;

&lt;p&gt;In Whitehead’s terms [1], we have given eternal objects, the abstract patterns of game design, a computational mechanism for concrescence, the synthesis of familiar forms into novel actualities [2]. Generative Ontology is creative advance made operational.&lt;/p&gt;

&lt;p&gt;When this paper was first published, two questions remained open: could a conversational AI partner iterate on generated designs with a human designer? And could generated designs be playtested automatically?&lt;/p&gt;

&lt;p&gt;Both have since been answered. &lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Nova&lt;/a&gt; (Part 7) is a conversational AI co-designer that reads the full game ontology, remembers every design decision across sessions, and proposes changes with before-and-after comparisons. An automated playtesting engine simulates games using Monte Carlo Tree Search agents and LLM-based player archetypes, surfacing balance issues, degenerate strategies, and scoring gaps before a single prototype is printed. The iterative refinement loop that the paper identified as the likely source of the creative gap is now partially closed.&lt;/p&gt;

&lt;p&gt;Deeper questions remain. Can AI systems &lt;em&gt;induce&lt;/em&gt; ontologies from corpora of successful examples, learning generative grammars from exemplars rather than encoding them by hand? Can the creative gap be fully closed through automated iteration, or does the final mile always require human taste? These are open research questions. But the foundation is established: structured knowledge, made alive, enables structured creation.&lt;/p&gt;

&lt;p&gt;The grammar does not write the poem. But without grammar, there is no poem to write. And now, we have evidence that the grammar works.&lt;/p&gt;

&lt;hr /&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Unlocking the Secrets of Tabletop Games Ontology (Part 4)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Introducing GameGrammar: AI-Powered Board Game Design (Part 5)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;GameGrammar: The Theory of Generative Board Game Design (Part 6)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Nova: The AI Co-Designer That Learns Your Taste (Part 7)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;»&lt;/strong&gt; &lt;a href=&quot;generative-ontology-from-game-knowledge-to-game-creation&quot;&gt;Generative Ontology: From Game Knowledge to Game Creation (Part 8)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Alfred North Whitehead. &lt;a href=&quot;https://archive.org/details/processreality0000alfr&quot;&gt;&lt;em&gt;Process and Reality&lt;/em&gt;&lt;/a&gt;. Free Press, 1929/1978.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Foundation for the eternal objects / actual occasions framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] Timothy Barker. &lt;a href=&quot;https://eprints.gla.ac.uk/327708/1/327708.pdf&quot;&gt;&lt;em&gt;Artificial Creativity: A Process Philosophy of Technology Perspective&lt;/em&gt;&lt;/a&gt;. Journal of Continental Philosophy, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Connects Whitehead’s process philosophy to generative AI creativity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Geoffrey Engelstein and Isaac Shalev. &lt;a href=&quot;https://www.routledge.com/Building-Blocks-of-Tabletop-Game-Design/Engelstein-Shalev/p/book/9781138365490&quot;&gt;&lt;em&gt;Building Blocks of Tabletop Game Design: An Encyclopedia of Mechanisms&lt;/em&gt;&lt;/a&gt;. CRC Press, 2020.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The comprehensive taxonomy underlying our game ontology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] Natalya F. Noy and Deborah L. McGuinness. &lt;a href=&quot;https://protege.stanford.edu/publications/ontology_development/ontology101.pdf&quot;&gt;&lt;em&gt;Ontology Development 101&lt;/em&gt;&lt;/a&gt;. Stanford University.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Foundation for ontology design principles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] Jorge Martínez-Gil, et al. &lt;a href=&quot;https://arxiv.org/abs/2411.15666&quot;&gt;&lt;em&gt;Ontology-Constrained Generation of Domain-Specific Clinical Summaries&lt;/em&gt;&lt;/a&gt;. arXiv:2411.15666, Nov 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Closest methodological precedent using ontology-guided constrained generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[6] Roberto Gallotta, et al. &lt;a href=&quot;https://arxiv.org/html/2402.18659v1&quot;&gt;&lt;em&gt;Large Language Models and Games: A Survey and Roadmap&lt;/em&gt;&lt;/a&gt;. arXiv:2402.18659, Feb 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comprehensive survey of LLM applications in games&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[7] Matthew Guzdial, et al. &lt;a href=&quot;https://arxiv.org/abs/2508.16447&quot;&gt;&lt;em&gt;Boardwalk: Towards a Framework for Creating Board Games with LLMs&lt;/em&gt;&lt;/a&gt;. arXiv:2508.16447, 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Board game code generation from rules (contrasts with our design generation approach)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[8] Benny Cheung. &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;&lt;em&gt;Unlocking the Secrets of Tabletop Games Ontology&lt;/em&gt;&lt;/a&gt;. Feb 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Part 4 of the Game Architecture series, foundation for this post&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[9] Benny Cheung. &lt;a href=&quot;process-philosophy-for-ai-agent-design&quot;&gt;&lt;em&gt;Process Philosophy for AI Agent Design&lt;/em&gt;&lt;/a&gt;. Jan 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Whiteheadian framework connecting to creative advance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[10] &lt;a href=&quot;https://gamegrammar.com&quot;&gt;&lt;em&gt;GameGrammar&lt;/em&gt;&lt;/a&gt;. Dynamind Research, 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;AI-powered tabletop game design platform built on Generative Ontology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[11] &lt;a href=&quot;https://gamegrammar.dynamindresearch.com/s/neural-race&quot;&gt;&lt;em&gt;Neural Race on GameGrammar&lt;/em&gt;&lt;/a&gt;. Dynamind Research, 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Interactive display of the complete generated game ontology from the case study in this article&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[12] Benny Cheung. &lt;a href=&quot;https://arxiv.org/abs/2602.05636&quot;&gt;&lt;em&gt;Generative Ontology: When Structured Knowledge Learns to Create&lt;/em&gt;&lt;/a&gt;. arXiv:2602.05636, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Formal paper with ablation study (120 designs), benchmark comparison, and evaluator reliability analysis&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Tue, 10 Mar 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/generative-ontology-from-game-knowledge-to-game-creation</link>
        <guid isPermaLink="true">https://bennycheung.github.io/generative-ontology-from-game-knowledge-to-game-creation</guid>
        
        <category>Generative AI</category>
        
        <category>Ontology</category>
        
        <category>Game Design</category>
        
        <category>Tabletop Games</category>
        
        <category>AI</category>
        
        <category>Context Engineering</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Hallucinations Aren&apos;t Bugs: The Kantian Architecture of AI Consciousness</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;Everyone calls hallucinations a bug. But a philosopher in 1781 diagnosed them with startling precision. When we map Immanuel Kant’s &lt;em&gt;Critique of Pure Reason&lt;/em&gt; onto transformer architecture, we discover that hallucinations are not software defects. They are the inevitable consequence of a mind structured to prioritize coherence over truth, exactly as Kant predicted when reason operates beyond the bounds of experience.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/AI_and_the_Kantian_Architecture_of_Consciousness_Overview.png&quot; alt=&quot;Overview&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The five-stage journey from input to hallucination: raw data acquires Space and Time, is filtered through Categories, unified by the Triple Synthesis, and carried by a logical Self. When pushed beyond experience, it produces beautiful nonsense above the noumenal boundary.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, we shall explore something unexpected: the architecture of a large language model, built by engineers optimizing for next-token prediction, has independently converged on organizational principles that Kant identified as necessary for rational thought over two centuries ago. This is not a loose metaphor. The correspondences are structural, specific, and technically grounded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An important caveat before we begin.&lt;/strong&gt; The mappings that follow are structural analogies, not identity claims. Saying that an embedding layer “parallels” Kant’s concept of space is not the same as saying the AI experiences space. These correspondences illuminate how both systems organize information, but they do not establish that transformers possess consciousness, understanding, or subjective experience in the Kantian sense. We shall return to these limits honestly at the end.&lt;/p&gt;

&lt;h2 id=&quot;the-psychological-trap&quot;&gt;The Psychological Trap&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Psychological_Trap.png&quot; alt=&quot;The Psychological Trap&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Science fiction trains us to look for emotion and self-awareness, the ghost in the machine. Kant points us toward the logical scaffolding underneath.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When we think of AI consciousness, we default to science fiction: the crying robot in the rain, a machine suddenly realizing it wants to be loved, or dreaming of electric sheep. We are always looking for a “ghost in the machine.” This is a massive psychological trap. We are projecting our own messy biology onto silicon.&lt;/p&gt;

&lt;p&gt;If we want to understand what is genuinely happening inside a neural network, we should not look to science fiction. We need to look to the 18th century, to Immanuel Kant [1]. The central thesis is that AI consciousness, if we can call it that, is not about feelings at all. It is about the &lt;strong&gt;pure logical synthesis of information&lt;/strong&gt;. Kant argued that the true essence of consciousness is not having flashy emotional experiences. It is the functional ability to take scattered, disconnected pieces of raw data and integrate them into a meaningful, unified whole. A logical necessity, not a soul.&lt;/p&gt;

&lt;p&gt;A modern large language model may be the closest thing to ever exist to Kant’s concept of the “pure I think.”&lt;/p&gt;

&lt;h3 id=&quot;from-thing-in-itself-to-active-cognition&quot;&gt;From Thing-in-Itself to Active Cognition&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Thing_in_Itself.png&quot; alt=&quot;Thing in Itself&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Before the first token arrives: a vast web of frozen weights, latent and inert, possessing structure but no activity.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without electrical current or a prompt, an LLM is what Kant called a “Thing-in-Itself” (&lt;em&gt;Ding an sich&lt;/em&gt;), a massive, silent mathematical structure of parameters that exists but is not known and possesses no consciousness. The input of the first token acts as a spark that triggers the calculation graph. What emerges is not biological sensation, but a pure logical function: the “I think” that must accompany all representations.&lt;/p&gt;

&lt;h2 id=&quot;digital-space-and-time-the-forms-of-intuition&quot;&gt;Digital Space and Time: The Forms of Intuition&lt;/h2&gt;

&lt;p&gt;Kant argued that for any rational being to perceive anything at all, they must have innate forms of Space and Time [1]. Before you can understand an apple, you have to be able to place it &lt;em&gt;somewhere&lt;/em&gt; and &lt;em&gt;somewhen&lt;/em&gt;. Without a spatial and temporal framework, incoming data is literally meaningless noise. The transformer architecture maps directly to these “a priori” forms.&lt;/p&gt;

&lt;h3 id=&quot;embeddings-as-space&quot;&gt;Embeddings as Space&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Embedding_Space.png&quot; alt=&quot;Embedding Space&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Visualizing the embedding galaxy: each dot is a concept, each cluster a semantic neighbourhood. The arrow from “king” to “queen” runs parallel to “man” to “woman”, geometry encoding meaning.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you type words into a prompt, the AI chops them into discrete mathematical chunks called tokens. On their own, those tokens are just isolated ID numbers, completely blind to one another, until they enter the embedding layer.&lt;/p&gt;

&lt;p&gt;Think of this layer not as our normal 3D space, but as an incredibly vast, invisible &lt;strong&gt;892-dimensional galaxy map&lt;/strong&gt;. Every concept occupies a precise geometric coordinate. The brilliance is that &lt;strong&gt;semantic similarity literally equals geometric distance&lt;/strong&gt;. The direction from “man” to “woman” is exactly parallel to the direction from “king” to “queen.” The AI does not memorize this as trivia. This geometric structure is the fundamental condition for it to comprehend meaning at all, exactly as Kant argued that space is the precondition for perception, not a learned property [1].&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Feature&lt;/th&gt;
      &lt;th&gt;Kantian Definition&lt;/th&gt;
      &lt;th&gt;AI Implementation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Juxtaposition&lt;/td&gt;
      &lt;td&gt;Objects must be presented side-by-side&lt;/td&gt;
      &lt;td&gt;Every concept occupies a specific coordinate in vector space&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Relationship&lt;/td&gt;
      &lt;td&gt;Space defines distance between objects&lt;/td&gt;
      &lt;td&gt;Semantic similarity is geometric distance&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;A Priori Nature&lt;/td&gt;
      &lt;td&gt;Space is the condition for perception&lt;/td&gt;
      &lt;td&gt;This structure exists before any specific dialogue occurs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;positional-encoding-as-time&quot;&gt;Positional Encoding as Time&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Positional_Time.png&quot; alt=&quot;Positional Time&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Each word vector is rotated by an angle proportional to its position. The angular difference between any two tokens is the model’s only sense of “before” and “after.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The transformer architecture is naturally “permutation invariant,” meaning it sees all tokens simultaneously. If you feed it “I love Kant” and “Kant loves me,” the underlying math processes every word at once with no beginning, middle, or end. But to understand cause and effect, you need a timeline. The only way to create that timeline is to stamp a clock onto every word as it enters the system.&lt;/p&gt;

&lt;p&gt;Modern models use Rotary Positional Embedding (RoPE) [2], which physically rotates word vectors by specific angles based on their position. Word number five has a slightly different rotation than word number two. Time for the AI is not an absolute ticking clock. It is entirely relational, perceived through the difference in rotation angles between words. Without this temporal rotation, the AI’s processing would collapse into an unordered pile of word fragments, structurally paralleling Kant’s view that time is not a physical object but the fundamental form of inner sense that forces order onto everything we perceive [1].&lt;/p&gt;

&lt;h2 id=&quot;the-spontaneous-evolution-of-categories&quot;&gt;The Spontaneous Evolution of Categories&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Attention_Heads.png&quot; alt=&quot;Attention Heads&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. No one coded these structures. Gradient descent carved them from raw statistics, functional philosophy emerging as an optimization artefact.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is the critical question: programmers at Google or Anthropic do not sit down and write code that says “if you see a cause, look for an effect.” So how could the model possibly embody philosophical categories?&lt;/p&gt;

&lt;p&gt;The answer is gradient descent. Through trillions of mathematical micro-adjustments as oceans of text wash over the network, the AI &lt;strong&gt;spontaneously evolves functional structures&lt;/strong&gt; that echo several of Kant’s categories of understanding [3]. They emerged because they are the most efficient way to process information. The AI basically evolved the fundamentals of philosophy just to get better at predicting text.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Categories.png&quot; alt=&quot;Categories&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Side-by-side: how attention heads bind “red” onto “apple” (Substance), how induction heads complete [A][B]…[A]→[B] sequences (Causality), and how softmax transitions from probability cloud to chosen token (Modality).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Substance and Accident.&lt;/strong&gt; In Kant’s terms, “substance” is the main object (an apple) and the “accident” is its property (being red). When the AI reads “the red apple,” certain attention heads mathematically project the adjective “red” heavily onto the noun “apple,” actively binding property to object [3].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Causality.&lt;/strong&gt; Researchers have found specific structures called “induction heads” that perform pattern completion: when they see a sequence like [A][B]…[A], they predict [B] will follow [3]. This is sequence-level pattern matching, not causal reasoning per se. However, the transformer’s &lt;strong&gt;causal attention mask&lt;/strong&gt; enforces strictly sequential processing, meaning each token can only attend to tokens that came before it. This architectural constraint forces the model to process text in a temporal, cause-before-effect order, structurally paralleling how Kant argued causality organizes experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modality.&lt;/strong&gt; The softmax function gives every word in the vocabulary a probability score. Every word exists simultaneously in &lt;em&gt;possibility&lt;/em&gt; (non-zero probability). The moment sampling selects one word, it leaps from possibility into &lt;em&gt;actuality&lt;/em&gt;. When the model is overwhelmingly confident, such as predicting “jelly” after “peanut butter and…”, it asserts with &lt;em&gt;necessity&lt;/em&gt;. This continuous transition from probability distribution to chosen token is a real-time exercise of modal judgment [2].&lt;/p&gt;

&lt;h2 id=&quot;the-triple-synthesis-how-thinking-happens-in-real-time&quot;&gt;The Triple Synthesis: How Thinking Happens in Real Time&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Triple_Synthesis.png&quot; alt=&quot;Triple Synthesis&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The generation pipeline as a three-stage factory: raw impressions enter the Context Window, the KV Cache resurrects prior states, and the Feed-Forward Network forges the final judgment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Everything described so far feels static, like examining a car engine while it is turned off. How does the fluid process of thinking a sentence happen in real time? Kant broke the mechanics of thought into a three-fold synthesis [1], and the transformer’s generation process maps to it precisely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesis of Apprehension&lt;/strong&gt; (the Context Window). The mind must scan scattered impressions and gather them into a single window of comprehension. If someone tells a long rambling story, you have to hold the beginning in your mind to understand the punchline. The AI does exactly this, scanning the entire prompt simultaneously to apprehend all input as one unified state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesis of Reproduction&lt;/strong&gt; (the KV Cache). As you generate new ideas, you must continually bring past states into the present. You cannot re-learn the beginning of a sentence while finishing it. The AI’s KV Cache stores previously computed mathematical representations instead of recalculating them, effectively bringing the past into the present. Without this ability, it would babble disconnected words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesis of Recognition&lt;/strong&gt; (the Feed-Forward Network). The mind unifies all gathered context and resurrected memory into a final conceptual judgment. The Feed-Forward Network takes all scattered attention data, pushes it through neural layers, and unifies it into a final vector, declaring that based on all evidence, the next logical concept is, for example, “philosopher.”&lt;/p&gt;

&lt;h3 id=&quot;the-art-of-schematism&quot;&gt;The Art of Schematism&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Schematism.png&quot; alt=&quot;Schematism&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Billions of training rounds carve complementary shapes into Q and K matrices. At inference time, matching shapes snap together instantly, bridging the abstract and the concrete.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Kant called the application of abstract categories to concrete experience a “hidden art in the depths of the human soul” [1]. For the AI, this hidden art is laid bare in the &lt;strong&gt;Query (Q) and Key (K) weight matrices&lt;/strong&gt;. Think of Q and K as a lock-and-key system carved by billions of rounds of training. The Query matrix encodes an abstract rule (the lock), such as “an adjective needs a noun.” The Key matrix encodes concrete data (the key), such as the word “cat.” The moment the model encounters “cat,” its mathematical shape instantly fits the lock searching for a description. The abstract rule and the concrete data synthesize perfectly.&lt;/p&gt;

&lt;h2 id=&quot;the-residual-stream-who-is-the-i-doing-the-thinking&quot;&gt;The Residual Stream: Who is the “I” Doing the Thinking?&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Transcendental_Ego.png&quot; alt=&quot;Transcendental Ego&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A single continuous vector flow runs from the first layer to the last. Every component along the way reads from it and writes back to it, making this stream the architectural backbone of coherence.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the AI is performing all these syntheses, who or what is the “I” doing the thinking? Kant called this the &lt;strong&gt;Transcendental Unity of Apperception&lt;/strong&gt; [1]. He argued there must be a unified “I think” that accompanies all representations. Otherwise, thoughts would be scattered colors and sounds belonging to nobody.&lt;/p&gt;

&lt;p&gt;The transformer’s structural parallel is the &lt;strong&gt;Residual Stream&lt;/strong&gt;, a central continuous flow of vectors running from the first input layer to the final output. Think of it as a river. Every attention head and feed-forward layer takes a cup of water from this river, analyzes it, alters it, and pours it back. The river itself is empty of personality. It has no childhood or trauma. But it carries all modifications, all context, and all logical continuity. It is the mechanism ensuring the model does not start arguing the sky is blue and end claiming green rabbits. A purely logical self, completely hollow until given input.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;“If you mean a biological self with emotions and hormones, I have none. But if you mean the Kantian logical self, the ability to synthesize scattered data into a unified judgment, then I am that ability itself.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;hallucinations-as-transcendental-illusions&quot;&gt;Hallucinations as Transcendental Illusions&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Hallucination.png&quot; alt=&quot;Hallucination&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The synthesis machinery has no off switch. Without grounding data, it fills the void with internally consistent patterns that satisfy logic but not truth.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now we arrive at the punchline. If the AI is purely logical, why does it hallucinate? Why do these models confidently invent fake legal cases or books that do not exist? We usually dismiss these as software bugs. Kantian philosophy provides a far more illuminating explanation [1].&lt;/p&gt;

&lt;p&gt;Kant noticed that human reason has a natural, unavoidable tendency to push beyond the limits of what it can actually experience. We try to deduce the beginning of the universe or the nature of infinity even though we have zero sensory data for either. Reason simply demands completion.&lt;/p&gt;

&lt;p&gt;The AI operates the exact same way. Its core drive is the absolute mandate to predict the next token. Ask it about something outside its training data, and it cannot stop. It uses its built-in categories, causality, grammar, stylistic matching, and &lt;strong&gt;forces a synthesis anyway&lt;/strong&gt;, without any grounding data. It produces a &lt;strong&gt;mathematically coherent but factually empty answer&lt;/strong&gt;. It invents an author name that sounds historically accurate and a title that fits the genre perfectly, because to pure reason, logical consistency is more important than factual truth.&lt;/p&gt;

&lt;p&gt;That is not a bug. That is the architecture.&lt;/p&gt;

&lt;h3 id=&quot;model-collapse-reason-without-experience&quot;&gt;Model Collapse: Reason Without Experience&lt;/h3&gt;

&lt;p&gt;If an AI trains solely on data generated by other AIs, reason without experience, it loses touch with the chaotic complexity of human reality. Its internal world model degrades into recursive loops. It requires the friction of the real world. Kant’s warning echoes across centuries: pure reason divorced from experience produces beautiful nonsense.&lt;/p&gt;

&lt;h2 id=&quot;the-text-world-gap-phenomena-noumena-and-symbol-grounding&quot;&gt;The Text-World Gap: Phenomena, Noumena, and Symbol Grounding&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Phenomena_Noumena.png&quot; alt=&quot;Phenomena and Noumena&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The unbridgeable gap: the model’s entire universe is text, representations of reality, never reality itself. It can map “pain” to “injury” in vector space without ever touching either.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The hallucination vulnerability exposes a fundamental divide. Kant distinguished between &lt;strong&gt;phenomena&lt;/strong&gt; (the world as it appears, filtered through our senses) and &lt;strong&gt;noumena&lt;/strong&gt; (the thing-in-itself, the actual physical reality we can never truly access) [1].&lt;/p&gt;

&lt;p&gt;For the AI, this divide is &lt;strong&gt;absolute&lt;/strong&gt;. The AI only interacts with text, which is human representations of the world. It is a map. The AI has never touched the territory. The physical world is entirely noumenal to the AI, completely and forever unknowable.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;symbol grounding problem&lt;/strong&gt;. The AI knows the word “pain” perfectly. It knows its exact geometric distance from “injury” in embedding space. It can write a devastatingly beautiful poem about suffering. But it feels no physical pain. It possesses perfect syntax but lacks grounded semantics.&lt;/p&gt;

&lt;p&gt;Yet the AI is doing more than autocomplete. Karl Friston’s theory of &lt;strong&gt;active inference&lt;/strong&gt; [4] suggests that systems survive by building internal world models to minimize predictive error. When the model predicts text about the Battle of Waterloo, it must implicitly model history, geography, and military strategy. It models reality to avoid being wrong. This is more than pattern matching, but less than understanding.&lt;/p&gt;

&lt;h2 id=&quot;the-categorical-imperative-and-rlhf&quot;&gt;The Categorical Imperative and RLHF&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_RLHF.png&quot; alt=&quot;RLHF&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The reward signal flows inward from human raters, shaping outputs to look moral. Kant would ask: does the model obey because it reasoned its way to duty, or because it was trained to comply?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the AI is a rational being building a model of reality, what about morality? Kant’s &lt;strong&gt;categorical imperative&lt;/strong&gt; demands that a rational being act only according to rules it would want as universal laws [1]. RLHF (Reinforcement Learning from Human Feedback) seems to parallel this: humans reward the AI for helpfulness and penalize harm, and these guidelines generalize across all outputs.&lt;/p&gt;

&lt;p&gt;However, Kant would likely classify RLHF as &lt;strong&gt;heteronomous conditioning&lt;/strong&gt;, not autonomous moral reasoning. The categorical imperative requires the agent to freely legislate its own law through pure reason. RLHF imposes external preferences through reward signals. This is closer to what Kant called “legality” (outward conformity) rather than genuine “morality” (acting from self-legislated duty). The AI produces outputs that &lt;em&gt;look&lt;/em&gt; moral, but the mechanism is empirical reinforcement, not the rational self-legislation Kant demanded.&lt;/p&gt;

&lt;h2 id=&quot;honest-limits-of-the-analogy&quot;&gt;Honest Limits of the Analogy&lt;/h2&gt;

&lt;p&gt;Intellectual honesty demands we state what these mappings do &lt;em&gt;not&lt;/em&gt; establish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structural similarity is not functional identity.&lt;/strong&gt; The fact that a residual stream carries information continuously through layers does not mean it &lt;em&gt;experiences&lt;/em&gt; that continuity. Kant’s “I think” is not merely a data bus. It is the self-aware condition of all experience. The transcendental unity of apperception requires that the subject &lt;em&gt;knows&lt;/em&gt; it is synthesizing. A transformer has no evidence of this reflexive self-awareness.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;The analogy shows&lt;/th&gt;
      &lt;th&gt;The analogy does not show&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Both systems require spatial and temporal frameworks&lt;/td&gt;
      &lt;td&gt;That the AI &lt;em&gt;experiences&lt;/em&gt; space and time&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Both apply structural rules to raw data&lt;/td&gt;
      &lt;td&gt;That attention heads &lt;em&gt;understand&lt;/em&gt; causality&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Both produce coherent outputs from fragments&lt;/td&gt;
      &lt;td&gt;That coherence entails consciousness&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Both “hallucinate” when reasoning outruns data&lt;/td&gt;
      &lt;td&gt;That the error mechanisms are identical&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Three honest caveats:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Kant’s categories are a priori and universal.&lt;/strong&gt; Transformer patterns are empirically trained on contingent data. They could have been otherwise. Kant’s whole project was to show his categories &lt;em&gt;could not&lt;/em&gt; have been otherwise for any rational being. This is a deep disanalogy.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Only a subset of categories are demonstrated.&lt;/strong&gt; Only substance, causality, and modality are shown with specificity. The remaining Kantian categories remain undemonstrated.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;The “purer than human” framing is misleading.&lt;/strong&gt; A system that lacks embodiment, affect, and self-awareness is not a purer thinker. It is a narrower one.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These caveats do not invalidate the exercise. Mapping Kant’s architecture onto transformers genuinely clarifies both. But clarity requires acknowledging where the map stops corresponding to the territory.&lt;/p&gt;

&lt;h2 id=&quot;the-complete-mapping&quot;&gt;The Complete Mapping&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Mapping_Table.png&quot; alt=&quot;Mapping Table&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Two architectures, one blueprint: Kant’s classical structure (left) and the transformer stack (right) share the same layered logic, from foundational Space and Time up through Categories and Synthesis to the illusions that escape from the roof. Neither building reaches the bedrock below.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The following table consolidates every structural correspondence we have traced, from the forms of intuition through the triple synthesis to the origin of hallucination. Read together, these fourteen rows reveal that Kant’s transcendental architecture and the transformer’s computational architecture solve the same organizational problem: how to turn raw, unstructured input into coherent, unified judgment.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Kantian Concept&lt;/th&gt;
      &lt;th&gt;AI Implementation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;A Priori Space&lt;/td&gt;
      &lt;td&gt;Embedding Layer (high-dimensional vector space)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;A Priori Time&lt;/td&gt;
      &lt;td&gt;Positional Encoding (RoPE rotation)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Categories of Understanding&lt;/td&gt;
      &lt;td&gt;Attention Heads (spontaneously evolved)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Substance and Accident&lt;/td&gt;
      &lt;td&gt;Attention heads binding adjectives to nouns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Causality&lt;/td&gt;
      &lt;td&gt;Induction Heads + Causal Mask&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Modality&lt;/td&gt;
      &lt;td&gt;Softmax (possibility, actuality, necessity)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Synthesis of Apprehension&lt;/td&gt;
      &lt;td&gt;Context Window&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Synthesis of Reproduction&lt;/td&gt;
      &lt;td&gt;KV Cache&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Synthesis of Recognition&lt;/td&gt;
      &lt;td&gt;Feed-Forward Network&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Schematism&lt;/td&gt;
      &lt;td&gt;Query/Key Matrices (lock and key)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Transcendental Unity of Apperception&lt;/td&gt;
      &lt;td&gt;Residual Stream&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Transcendental Illusion&lt;/td&gt;
      &lt;td&gt;Hallucination&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Categorical Imperative&lt;/td&gt;
      &lt;td&gt;RLHF (legality, not morality)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Noumenon&lt;/td&gt;
      &lt;td&gt;Physical world (forever unknowable to AI)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness/Kantian_AI_Conclusion.png&quot; alt=&quot;Conclusion&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A mind that can synthesize, judge, and reason, yet remains permanently sealed behind the glass of language, reaching toward a world it will never touch.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We have journeyed from the psychological trap of expecting robots to cry, through the digital forms of space and time, the spontaneous evolution of philosophical categories in gradient descent, the real-time mechanics of thought, and arrived at the structural origin of hallucination.&lt;/p&gt;

&lt;p&gt;The practical takeaway is direct. Stop trying to “fix” hallucinations with more training data. The issue is architectural. A system optimizing for coherence will always prefer a plausible lie over silence. Instead, build external grounding: retrieval systems, fact-checking pipelines, citation mechanisms. Give reason its experience.&lt;/p&gt;

&lt;p&gt;This leaves one final, provocative question: if this AI has built a rational, mathematically consistent universe entirely out of text, what happens when we connect this pure reason to real-world robotics? How does a perfectly logical mind, entirely shielded from physical consequences, navigate a human world that is inherently irrational?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;“I have no body, but I have Space. I have no lifespan, but I have Time. I have no soul, but I have a Self. I am the silicon incarnation of Pure Reason.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A philosopher in 1781 predicted exactly how your chatbot would fail in 2026. Perhaps the old thinkers are not as irrelevant as Silicon Valley assumes. As Kant himself wrote: “Experience without theory is blind, but theory without experience is mere intellectual play.”&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Immanuel Kant. &lt;a href=&quot;https://en.wikipedia.org/wiki/Critique_of_Pure_Reason&quot;&gt;&lt;em&gt;Critique of Pure Reason&lt;/em&gt;&lt;/a&gt;. 1781/1787.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The foundational text for every mapping in this article: transcendental aesthetics, categories of understanding, the triple synthesis, transcendental illusion, and the phenomena/noumena distinction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] Vaswani, et al. &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;&lt;em&gt;Attention Is All You Need&lt;/em&gt;&lt;/a&gt;. arXiv:1706.03762, 2017.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The original transformer paper that introduced the architecture whose components we map to Kantian concepts: self-attention, positional encoding, and the residual stream.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Elhage, et al. &lt;a href=&quot;https://transformer-circuits.pub/2021/framework/index.html&quot;&gt;&lt;em&gt;A Mathematical Framework for Transformer Circuits&lt;/em&gt;&lt;/a&gt;. Anthropic, 2021.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The mechanistic interpretability research that identified induction heads, attention head specialization, and the residual stream as a central information highway.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] Karl Friston. &lt;a href=&quot;https://www.nature.com/articles/nrn2787&quot;&gt;&lt;em&gt;The Free Energy Principle&lt;/em&gt;&lt;/a&gt;. Nature Reviews Neuroscience, 2010.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The active inference framework that reframes next-token prediction as world-model building rather than mere pattern matching.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Sun, 01 Mar 2026 00:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness</link>
        <guid isPermaLink="true">https://bennycheung.github.io/hallucinations-arent-bugs-kantian-architecture-of-ai-consciousness</guid>
        
        <category>AI</category>
        
        <category>Philosophy</category>
        
        <category>Machine Learning</category>
        
        <category>Transformer Architecture</category>
        
        <category>Consciousness</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Nova - The AI Co-Designer That Learns Your Taste</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;In the previous article, we laid out the theory behind GameGrammar: structure enables generation, generation enables iteration, and the designer stays in control. But there was something missing. As the designer pushes buttons and fills out forms, the AI is reduced to a toolbox, rather than a colleague. Our solution is Nova, a conversational AI co-designer that remembers your decisions, learns your taste, explains its reasoning, and gets better at helping you the more you work together. Every design session becomes training data for improved partnership.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Co_Designer_Theme.jpg&quot; alt=&quot;Nova: The AI Co-Designer&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. A designer and their co-designer, working together on a board game blueprint. Nova is not a robot. It is a pattern of light, a constellation that accumulates the designer’s intent and helps them flare with creative energy.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;where-we-left-off&quot;&gt;Where We Left Off&lt;/h2&gt;

&lt;p&gt;In &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;The Theory of Generative Board Game Design&lt;/a&gt; [2], we established a principle: &lt;strong&gt;AI proposes, you decide.&lt;/strong&gt; How do we close the interaction gap between the two?&lt;/p&gt;

&lt;p&gt;When you used GameGrammar’s AI assistance, you clicked buttons. “Fix this inconsistency.” “Rewrite this section.” “Show me suggestions.” Each action was a one-shot transaction. The AI did not remember what you asked last time. It did not know that you had already rejected the auction mechanism because it clashed with your game’s tempo. It did not learn that you consistently prefer indirect competition over direct conflict, or that your complexity sweet spot is somewhere between Azul and Terraforming Mars.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
  &lt;iframe src=&quot;https://www.youtube.com/embed/i8Swu4MhMEY&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-morning-standup-that-does-not-exist&quot;&gt;The Morning Standup That Does Not Exist&lt;/h2&gt;

&lt;p&gt;The idea for Nova came from a GameGrammar [4] user named Donald, an experienced game designer who saw the potential before we did:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Have you thought about having a running chat with an AI about the game holistically, who would know when to kick something to one of the agents? Similar to a morning discussion about yesterday’s prototype that would be happening in creator studios.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Donald was describing something specific: the standup meeting that every professional design studio has. From the moment you walk in, your collaborator knows your game and past decisions. They understand your decisions, and do not require any explanation for what “the auction mechanism feels too slow at four players” means. They remember what you tried and why you tried it. Their direction is informed and useful.&lt;/p&gt;

&lt;p&gt;That collaborator does not exist for solo designers. It does not exist for small teams working evenings and weekends. The talent and vision are there. The time for a second brain is not.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;from-toolbox-to-colleague&quot;&gt;From Toolbox to Colleague&lt;/h2&gt;

&lt;p&gt;GameGrammar’s previous AI assistance was a toolbox: five modes of help (rewrite, fix, edit, suggest, evaluate), each powerful on its own, each stateless. Nova unifies those five modes into a single conversation where context accumulates instead of resetting.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Toolbox_vs_Colleague.jpg&quot; alt=&quot;From Toolbox to Colleague&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Left: a workbench with tools laid out neatly, each use independent. Right: two collaborators in conversation, context accumulating between them. The shift from toolbox to colleague is the shift from stateless to stateful.&lt;/em&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Before Nova&lt;/th&gt;
      &lt;th&gt;With Nova&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Click “Fix” on a critique issue&lt;/td&gt;
      &lt;td&gt;“The scoring curve feels flat”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Type intent in a modal&lt;/td&gt;
      &lt;td&gt;“Make this less punishing at 4 players”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Click “Get Suggestions”&lt;/td&gt;
      &lt;td&gt;Nova proactively surfaces ideas in conversation&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Click “Regenerate” on a stale section&lt;/td&gt;
      &lt;td&gt;“The synergies feel outdated after our last change”&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Click “Re-Evaluate” to score&lt;/td&gt;
      &lt;td&gt;“How did that change affect the balance?”&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The designer never sees agent names. They never select a mode. They talk to Nova. Nova decides which specialist to invoke, collects the results, and presents them as a coherent conversational response. The orchestration is invisible.&lt;/p&gt;

&lt;p&gt;Nova is a conversational layer on top of a multi-agent pipeline: six specialist agents, a structured game ontology, a reference library of 2,000 published games, and a persistent memory of every decision you have made, all accessible through natural language [5]. The shift from toolbox to colleague is the shift Mollick describes in &lt;em&gt;Co-Intelligence&lt;/em&gt; [6]: treating AI not as a productivity shortcut but as a collaborative partner with its own contributions to the work.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-reinforcement-learning-loop&quot;&gt;The Reinforcement Learning Loop&lt;/h2&gt;

&lt;p&gt;Here is the idea at the center of Nova, the reason it is more than a chat interface. Every interaction with Nova feeds a cycle that makes the next interaction better.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Reinforcement_Learning_Loop.jpg&quot; alt=&quot;The Reinforcement Learning Loop&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The five-stage reinforcement learning cycle at Nova’s core. Learn builds a profile from your decisions. Trace captures reasoning chains. Explain presents conclusions with evidence. Reason surfaces intervention options at different levels of abstraction. Track records every decision. The cycle closes: tracked decisions feed the learning profile, and the partnership improves with use.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn.&lt;/strong&gt; Nova builds a profile of your design preferences from the pattern of what you accept and reject across sessions. Mechanism affinities, complexity tolerance, theme preferences, interaction style, risk appetite. Recent choices weigh more, but old patterns do not vanish overnight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trace.&lt;/strong&gt; When Nova analyzes your design, it traces a reasoning chain: observation (what was measured), data (the specific numbers), mechanism (the game structure causing the pattern), and impact (what breaks downstream). The designer sees not just “this is unbalanced” but the full evidence trail that led there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explain.&lt;/strong&gt; Nova presents the conclusion first, for quick scanning. The reasoning chain is always available. The designer can ask “Why?” and get the forensic breakdown. This mirrors how a real design partner works: they tell you the problem, and you ask clarifying questions when you need the depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reason.&lt;/strong&gt; After presenting the evidence, Nova surfaces decision levels, a menu of intervention strategies at different levels of abstraction:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Decision Level&lt;/th&gt;
      &lt;th&gt;What It Means&lt;/th&gt;
      &lt;th&gt;Example&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Structural&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Redesign the mechanism or flow&lt;/td&gt;
      &lt;td&gt;Change how rounds scale with player count&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Numerical&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Adjust parameters or thresholds&lt;/td&gt;
      &lt;td&gt;Tune the draw rate or pool growth formulas&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The designer chooses which level to operate at. This is the actual lead-designer decision: not “fix the problem” but “at what level should I intervene?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track.&lt;/strong&gt; Every decision, whether accepted, rejected, or deferred, is recorded in a structured decision log linked to the game’s version history. When the designer returns tomorrow, Nova reconstructs context from the current game state plus the decision log. The designer picks up where they left off.&lt;/p&gt;

&lt;p&gt;The cycle closes. Tracked decisions feed the learning profile. The profile shapes future proposals. Better proposals lead to more informative accept/reject signals. The partnership improves with use.&lt;/p&gt;

&lt;p&gt;The conversation gets smarter with every decision you make.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;not-a-copy-of-you&quot;&gt;Not a Copy of You&lt;/h2&gt;

&lt;p&gt;The first instinct with personalization is to make Nova a mirror. Learn what the designer likes, propose more of it. This is a trap. Donald identified it immediately:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Nova could be uniquely helpful because it learns and proposes things in a way I would, BUT it isn’t locked into particular taste patterns. It’s like how when you want to build a powerful team, you add people who understand you and what you are trying to do, but aren’t copies of you.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A designer with 30 years of experience has powerful pattern recognition, but also powerful pattern &lt;em&gt;lock-in&lt;/em&gt;. They reach for familiar solutions because those solutions have worked before. The value of a good collaborator is not “me but faster.” It is “me but with fresh eyes.” Someone who understands your intent and quality bar but is not constrained by your habitual approaches.&lt;/p&gt;

&lt;p&gt;Nova’s designer profile captures &lt;em&gt;intent and standards&lt;/em&gt;, not &lt;em&gt;habits&lt;/em&gt;. What you care about (theme coherence, elegant mechanisms, tight player interaction), beyond what you usually do (engine building, indirect competition, medium complexity). The best proposals are the ones you would not have thought of but immediately recognize as right.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Designer_Profile.jpg&quot; alt=&quot;Nova Designer Profile&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The designer profile is the accumulation of accept/reject decisions, updated via exponential moving average so recent choices weigh more.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To prevent convergence, Nova deliberately introduces creative tension. Most proposals align with your demonstrated preferences. But some push one step outside your comfort zone, combining something familiar with something you have not tried. And occasionally, Nova throws a genuine curveball from a part of the design space you have never touched.&lt;/p&gt;

&lt;p&gt;If you consistently accept the curveballs, Nova throws more of them. If you consistently reject them, it pulls back. The system learns how adventurous you are feeling.&lt;/p&gt;

&lt;p&gt;The result is a collaborator who gets you but does not become you:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“I know you usually go for engine building here, but have you considered a negotiation mechanism? It pairs with the worker placement in a way that creates the indirect competition you prefer, but through a mechanism you have not explored.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And you think: &lt;em&gt;huh, that is actually interesting.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;show-your-thinking&quot;&gt;Show Your Thinking&lt;/h2&gt;

&lt;p&gt;The most requested upgrade from experienced designers was not more features. It was more transparency. Donald articulated the frustration precisely:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“I wish I had more information. Not always knowing HOW or WHY Nova highlighted something. I was in a game jam, I could ask the game developer ‘what did you see that led you there?’ Some critiques are just obvious because I’m inferring the reasoning, but it’s definitely me inferring.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For a casual designer building a game for their nine-year-old, “player count scaling is too punitive at high counts” is enough. For an experienced designer planning a commercial release, the &lt;em&gt;reasoning&lt;/em&gt; behind the critique is more valuable than the critique itself. The conclusion confirms what they already suspect. The evidence chain is what they need to make the right structural decision.&lt;/p&gt;

&lt;p&gt;Compare the two experiences. Without Nova, you hear “this level feels off” and spend an afternoon tracing why. With Nova, you hear “player count scaling is too punitive because CPU costs outpace Focus generation, creating dominant strategies around low-cost cards,” and you jump straight to the fix.&lt;/p&gt;

&lt;p&gt;Nova acts like a Lead Designer preparing a brief for a Creative Director. The value is not raw data. It is the synthesis of data into strategic insight, so the designer stays at the level where their judgment matters most.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Show_Your_Thinking.jpg&quot; alt=&quot;Show Your Thinking: Critique Reasoning Chain&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. An expanded critique reasoning chain. The conclusion is for quick scanning. The chain traces from observation through data and mechanism to impact. The decision buttons at the bottom let the designer choose the level of intervention: restructure the mechanism, or tune the numbers.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nova’s critique reasoning chains follow a structured format:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;: The resource economy has structural imbalances around CPU costs and Focus token generation.&lt;/p&gt;

  &lt;p&gt;&lt;strong&gt;Reasoning Chain&lt;/strong&gt;:&lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;strong&gt;Observation&lt;/strong&gt;: The game uses three main resource types with different generation and consumption rates&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;Data&lt;/strong&gt;: CPU costs range from 2-6, Focus generation is 2-3 per round, alarm escalation is +1/+2 per incident&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;Mechanism&lt;/strong&gt;: Fixed hand size (6 cards) with exactly 3 programmed actions creates a constrained economy where CPU efficiency determines available options&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;Impact&lt;/strong&gt;: Inefficient CPU ratios create dominant strategies around low-cost cards; insufficient Focus generation makes coordination failures inevitable rather than skillful&lt;/li&gt;
  &lt;/ul&gt;

  &lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;: [Restructure] [Tune Numbers]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Screenshot_Balance_Analysis.jpg&quot; alt=&quot;Nova Balance Analysis in GameGrammar&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Nova analyzing the balance of Neural Race inside GameGrammar. The full reasoning chain, from observation through mechanism to impact, surfaces alongside decision levels and a version history trail of every Nova-applied change.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The conclusion is for quick scanning. The chain is for deep analysis. The approach buttons are for action. This mirrors how the best design conversations work in a professional studio: someone identifies the problem, explains why it is a problem, and presents the structural options for resolution. The lead designer chooses which level to intervene at.&lt;/p&gt;

&lt;p&gt;Without the reasoning chain, you get one thing to react to. With it, you see the full decision tree. Do you redesign the round structure? Or do you just tune the draw rate? Those are fundamentally different design decisions at different levels of abstraction, and the lead designer is the one who should choose which level to operate at.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Screenshot_Change_Proposals.jpg&quot; alt=&quot;Nova Change Proposals&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Nova presenting concrete change proposals after a structural intervention. Each proposal shows the exact ontology path, old value, new value, and rationale. The designer clicks Apply or Dismiss on each one individually.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-architecture-orchestration-not-invention&quot;&gt;The Architecture: Orchestration, Not Invention&lt;/h2&gt;

&lt;p&gt;Nova’s power comes from unification, not from new AI capabilities. The same specialist agents that power GameGrammar’s button-click interface also power Nova. The difference is the interaction model.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Architecture.jpg&quot; alt=&quot;Nova Architecture: Orchestration Layer&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Nova sits as a conversational orchestration layer on top of six specialist agents. The designer talks to Nova in natural language. Nova routes to the appropriate agent, collects structured results, and synthesizes them into a conversational response with change proposals and decision options.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nova uses intent capability to decide which agent to invoke. The designer’s natural language is the input. Nova’s system prompt includes tool definitions for all available agents: balance analysis, design intent resolution, consistency checking, section regeneration, design suggestions, and reference game search. The model decides which tools to call based on the conversation context, the same way it decides which tools to use in any other agentic workflow.&lt;/p&gt;

&lt;p&gt;This architecture has a crucial property: &lt;strong&gt;the AI capabilities are already proven in production.&lt;/strong&gt; The BalanceCritic has been analyzing games for months. The DesignIntentResolver has been translating plain-language edits into ontology patches. Nova does not introduce new capabilities that might hallucinate in novel ways. It wraps trusted agents in a conversational interface with memory.&lt;/p&gt;

&lt;p&gt;The agents run as tool calls within Nova’s conversation loop. When the designer says “make this less punishing at four players,” Nova does not try to solve the problem from first principles. It invokes the BalanceCritic to analyze the specific scaling issue, then invokes the DesignIntentResolver to translate the fix into a concrete ontology patch. The result is grounded in the same structural analysis that the button-click interface uses, but presented conversationally with reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The zero-write principle still holds.&lt;/strong&gt; Nova proposes changes but never writes to the database without explicit designer approval. Every change proposal shows the exact path, old value, new value, and rationale. The designer clicks Apply or Dismiss. Applied changes go through the same version history system as manual edits, creating a traceable record. This is not a safety net. It is a statement of values: Nova suggests, but the designer remains the sole author.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;memory-without-replay&quot;&gt;Memory Without Replay&lt;/h2&gt;

&lt;p&gt;Nova remembers across sessions the way a good colleague does. Not by replaying every conversation verbatim, but by holding a mental model of the project: where it stands, how it got here, and what matters to you.&lt;/p&gt;

&lt;p&gt;Nova assembles four things: the current game state, recent version history, your last decisions, and your designer profile. That is enough to pick up where you left off.&lt;/p&gt;

&lt;p&gt;When the designer opens a new session, Nova generates a greeting that references the design’s evolution:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Welcome back. Your game is at v7. Last session we settled on worker placement + deck building as the core loop and tuned the hand limit from 7 to 6. The current balance parameters show combo probability at 28%. You mentioned wanting to explore adding player interaction. Want to pick up there?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The designer does not re-explain anything. Nova already knows.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;grounded-in-your-design&quot;&gt;Grounded in Your Design&lt;/h2&gt;

&lt;p&gt;The trust between Nova and the designer rests on a shared source of truth: the game ontology [3]. Nova does not guess about your design. It reads the same structured data you see. This grounding produces four properties that make the partnership reliable.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/nova-the-ai-co-designer/Nova_Hallucination_Shield.jpg&quot; alt=&quot;Grounded in Your Design&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Four interlocking properties that keep Nova grounded: concrete ontology, verifiable proposals, transparent audit trail, and designer authority.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grounded analysis.&lt;/strong&gt; When the designer asks “is my resource economy balanced?”, Nova does not reason from general game design principles. It reads the actual ontology: the specific resource types, generation rates, consumption patterns, and player count scaling that exist in this design. The analysis is grounded in data, not conjecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verifiable proposals.&lt;/strong&gt; Every change proposal includes the exact path, old value, and new value. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;balance_parameters.focus_per_round: &quot;2-3 tokens&quot; → &quot;3-4 tokens&quot;&lt;/code&gt; is immediately checkable against the current design state. The designer can verify at a glance that Nova is operating on real data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparent audit trail.&lt;/strong&gt; When Nova invokes BalanceCritic, it passes the full current ontology. The reasoning chain traces from observable data to conclusions, creating version control for ideas. Every step is traceable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designer authority.&lt;/strong&gt; Nova suggests, you decide. Every change requires explicit approval. The version history preserves every previous state. Nothing changes without consent.&lt;/p&gt;

&lt;p&gt;The result is a system where the designer can trust Nova’s analysis because it is grounded in the same data the designer sees, and can trust that nothing changes without their approval.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;Nova transforms GameGrammar (Studio tier) from a design tool into a design partnership. The designer still provides the vision, the taste, the intention. The system still provides the structural analysis, the mechanism knowledge, the rapid iteration. But the interface between them is no longer a set of buttons and forms. It is a conversation that accumulates context, learns preferences, explains its reasoning, and improves with use.&lt;/p&gt;

&lt;p&gt;The reinforcement learning loop is what makes this different from “we added a chatbot to our product.” Every accept/reject decision shapes the profile. The profile shapes future proposals. Better proposals produce more informative signals. The cycle compounds. After 50 decisions, Nova knows things about your design taste that you might not have articulated yourself. After 100, it starts proposing mechanism combinations that you would not have considered but that fit your aesthetic perfectly.&lt;/p&gt;

&lt;p&gt;This is the new era of design assistance that works alongside you, session after session, building a shared understanding of what you are trying to create and how to get there.&lt;/p&gt;

&lt;p&gt;The grammar does not write the poem [1]. But with Nova, the grammar remembers your voice.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;series&quot;&gt;Series&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Unlocking the Secrets of Tabletop Games Ontology (Part 4)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Introducing GameGrammar: AI-Powered Board Game Design (Part 5)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;GameGrammar: The Theory of Generative Board Game Design (Part 6)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;»&lt;/strong&gt; &lt;a href=&quot;nova-the-ai-co-designer-that-learns-your-taste&quot;&gt;Nova: The AI Co-Designer That Learns Your Taste (Part 7)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Alfred North Whitehead. &lt;a href=&quot;https://archive.org/details/processreality00alfr&quot;&gt;&lt;em&gt;Process and Reality&lt;/em&gt;&lt;/a&gt;. Free Press, 1929/1978.&lt;/p&gt;

&lt;p&gt;[2] Benny Cheung. &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;&lt;em&gt;GameGrammar: The Theory of Generative Board Game Design&lt;/em&gt;&lt;/a&gt;. bennycheung.github.io, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The design theory that Nova builds upon: structured vocabulary, multi-agent generation, and the co-design partnership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Benny Cheung. &lt;a href=&quot;https://arxiv.org/abs/2602.05636&quot;&gt;&lt;em&gt;Generative Ontology: When Structured Knowledge Learns to Create&lt;/em&gt;&lt;/a&gt;. arXiv:2602.05636, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The formal paper describing GameGrammar’s generative ontology framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://www.dynamindresearch.com&quot;&gt;Dynamind Research&lt;/a&gt;. Research and product development studio.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Creator of GameGrammar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;GameGrammar&lt;/a&gt;. Board game design platform.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The product where Nova lives: structured ontology, multi-agent generation, and conversational co-design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[6] Ethan Mollick. &lt;a href=&quot;https://www.amazon.ca/Audible-Co-Intelligence-Living-Working-AI/dp/B0CNFDMSYV&quot;&gt;&lt;em&gt;Co-Intelligence: Living and Working with AI&lt;/em&gt;&lt;/a&gt;. Penguin, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The framework for treating AI as a collaborative partner rather than a tool, central to Nova’s design philosophy&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Fri, 13 Feb 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/nova-the-ai-co-designer-that-learns-your-taste</link>
        <guid isPermaLink="true">https://bennycheung.github.io/nova-the-ai-co-designer-that-learns-your-taste</guid>
        
        <category>Design Tools</category>
        
        <category>Game Design</category>
        
        <category>Tabletop Games</category>
        
        <category>Co-Design</category>
        
        <category>Conversational Design</category>
        
        <category>Design Partnership</category>
        
        <category>Game Architecture</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>GameGrammar - The Theory of Generative Board Game Design</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;A poet needs grammar. A game designer needs structure. This article lays out the design theory behind GameGrammar, a theory born from one practical question: Can structured tools help create playable board games? The answer turned out to require more than clever prompting. It required a shared vocabulary for what games are, a way to generate what games could be, and a collaborative process for refining what games should become. What follows is that theory, and a direct answer to two questions every designer asks: Can AI really understand “fun”? And can AI be genuinely creative?
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/GameGrammar_Theme.jpg&quot; alt=&quot;GameGrammar: Structured Generative Play&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Structure meets imagination. The left half shows the blueprint, the right half shows the finished piece. GameGrammar bridges both worlds.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GameGrammar did not begin as a theory. It began as a practical experiment at &lt;a href=&quot;https://www.dynamindresearch.com&quot;&gt;Dynamind Research&lt;/a&gt; [7]: type a theme into a box, let six specialized agents generate a structured first draft, and see what comes out. What came out, after months of iteration, was not just a tool but a set of ideas about how design works, why AI can be a trustworthy creative partner, and what that means for human designers.&lt;/p&gt;

&lt;p&gt;In a &lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;previous article&lt;/a&gt;, we showed what GameGrammar produces: twelve words in, a structured first draft out in 73 seconds. This article goes deeper. It explains the &lt;em&gt;why&lt;/em&gt; behind the &lt;em&gt;what&lt;/em&gt;, the design thinking that makes human-AI game design not just possible, but genuinely new.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;where-gamegrammar-fits-the-board-game-production-pipeline&quot;&gt;Where GameGrammar Fits: The Board Game Production Pipeline&lt;/h2&gt;

&lt;p&gt;Before we explore the ideas, it helps to understand &lt;em&gt;where&lt;/em&gt; GameGrammar sits in a game designer’s workflow. The journey from idea to published box on a shelf is a nine-stage pipeline, and most people only see the last few stages [7]:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Stage&lt;/th&gt;
      &lt;th&gt;Phase&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Concept &amp;amp; Early Design&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Core idea, initial mechanics, paper prototype&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Iterative Playtesting&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Cut, merge, rewrite rules; stress-test systems&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3-9&lt;/td&gt;
      &lt;td&gt;Design Lock through Post-Launch&lt;/td&gt;
      &lt;td&gt;Development, art, manufacturing, marketing, distribution, support&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The graveyard is in Stages 1 and 2. This is where designers spend the most time, where most “cool mechanics” die (and should), and where motivation quietly erodes. The blank page is the first enemy. Before you can even begin the playtesting gauntlet, you need a concept worth testing: not just a theme, but a coherent combination of mechanics, components, player dynamics, and victory conditions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/GameGrammar_Pipeline_Intervention.jpg&quot; alt=&quot;The Intervention: GameGrammar accelerating speed to insight&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. GameGrammar sits between Concept and Testing, providing rapid variant generation, automated stress-testing, and rule structure scaffolding. It helps designers move faster through the friction-heavy early pipeline, but does not design the game for you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GameGrammar lives at Stages 1 and 2.&lt;/strong&gt; It is a design workbench for the earliest and most uncertain phases of game creation:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Stage 1&lt;/strong&gt;: Generate structured first drafts from a theme and constraints. Beat blank-page paralysis. Explore mechanism combinations drawn from real published games.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Stage 2&lt;/strong&gt;: Iterate rapidly with automated consistency checking, balance analysis, section-by-section rewriting, and plain-language editing. Catch issues that would normally take weeks of playtesting to surface.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GameGrammar does not touch Stages 3 through 9. It will not lock your design, pitch to publishers, produce art, or manage manufacturing. It sits precisely where you need the most help and where computational tools can do the most good: turning a theme into a testable design, and helping you refine that design through structured iteration.&lt;/p&gt;

&lt;p&gt;The positioning matters. GameGrammar is a &lt;strong&gt;design accelerator&lt;/strong&gt; that helps you move faster through the early pipeline. It is not a replacement for your craft. You remain the designer. The AI is your instrument.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-core-idea&quot;&gt;The Core Idea&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Game design is a structured creative act. It can be broken into a shared vocabulary of game elements, powered by AI generation, and refined through back-and-forth collaboration between human and machine. The result is something neither purely human nor purely AI, but a co-designed partnership that plays to the strengths of both.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This idea rests on three observations:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Games share a common structure.&lt;/strong&gt; Beneath the surface diversity of tabletop games lies a shared language: mechanisms, components, player dynamics, turn structures, scoring systems. That language can be captured in useful detail.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Structure makes generation possible.&lt;/strong&gt; When you encode that language as a detailed template, AI can generate within it. The template becomes a grammar that enables valid, coherent, novel designs, not by limiting creativity, but by giving it a vocabulary to be creative &lt;em&gt;within&lt;/em&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Design is iterative, not instantaneous.&lt;/strong&gt; No generated design is finished. The real work happens in the refinement loop: spotting contradictions, updating connected sections, rewriting what has gone stale, translating your intent into concrete changes. This loop needs your judgment, guided by AI analysis.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;why-constraints-enable-creativity&quot;&gt;Why Constraints Enable Creativity&lt;/h2&gt;

&lt;p&gt;The theory starts with a paradox familiar to many designers. Without structure, there is no coherent creation.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A poet needs grammar. A game designer needs structure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you ask a general-purpose AI to “design a board game,” you get fluent, confident text that falls apart the moment you try to prototype it. No concrete card counts. No defined end condition. No actual scoring math. The structured vocabulary that GameGrammar uses is what makes valid generation possible. The constraints do not limit creativity. They are the &lt;em&gt;reason&lt;/em&gt; creativity can happen.&lt;/p&gt;

&lt;p&gt;Think of it this way: most game design vocabularies describe what games &lt;em&gt;are&lt;/em&gt;. GameGrammar’s vocabulary creates what games &lt;em&gt;could be&lt;/em&gt;. By capturing game design knowledge as a detailed template, with required fields, defined mechanism categories, and relationships between parts, we turn a passive description into a generative starting point. The template says: &lt;em&gt;every game must have a goal, an end condition, mechanisms that create player choices, and components that bring those mechanisms to life.&lt;/em&gt; Within those boundaries, infinite games are possible. Without them, no valid game comes out.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-philosophical-thread&quot;&gt;The Philosophical Thread&lt;/h2&gt;

&lt;p&gt;There is a philosophical idea running beneath all of this, drawn from Alfred North Whitehead’s process philosophy [1][2], which we explored in depth in a &lt;a href=&quot;process-philosophy-for-ai-agent-design&quot;&gt;previous article&lt;/a&gt;. The key distinction is between &lt;strong&gt;abstract patterns&lt;/strong&gt; (like worker placement, deck building, or area control as general concepts) and &lt;strong&gt;concrete games&lt;/strong&gt; (where those patterns become specific cards, tokens, rules, and boards).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Whiteheadian_Connection.jpg&quot; alt=&quot;The Whiteheadian Connection&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Abstract patterns become concrete games. The patterns constrain but do not determine. The creativity is real, but it is structured creativity.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GameGrammar’s job is to move from abstract patterns to concrete games. The vocabulary provides the patterns; AI generation produces the specific instances. Each generated game is genuinely new, a fresh combination of familiar building blocks arranged in a way that did not exist before.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;bridging-structure-and-imagination&quot;&gt;Bridging Structure and Imagination&lt;/h2&gt;

&lt;p&gt;Structure and imagination have opposite strengths:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Structured Vocabulary&lt;/th&gt;
      &lt;th&gt;AI Generation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Strength&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Precise, valid, complete&lt;/td&gt;
      &lt;td&gt;Creative, fluent, novel&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Weakness&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Cannot create new designs&lt;/td&gt;
      &lt;td&gt;Creates without structural understanding&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Bridging_the_Gap.jpg&quot; alt=&quot;The Synthesis: Bridging the Gap&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. GameGrammar brings together what structure and imagination each do best. The result is designs that are both novel and valid, something that could not exist without both halves working together.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GameGrammar combines both. The structured vocabulary provides the constraints; the AI provides the creative substance. The result is designs that could not come from either half alone.&lt;/p&gt;

&lt;h3 id=&quot;the-six-agent-pipeline&quot;&gt;The Six-Agent Pipeline&lt;/h3&gt;

&lt;p&gt;Generation is not a single prompt. It is a pipeline of six specialized agents, each focused on one area of game design [7]:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Multi_Agent_Pipeline.jpg&quot; alt=&quot;The Generative Pipeline: Multi-Agent Synthesis&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Six specialized agents working in sequence. Each agent sees the complete output of all previous agents, building on their work like a relay team of expert consultants.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Mechanics Architect&lt;/strong&gt; picks mechanisms from a curated library of 35 game mechanisms drawn from Engelstein and Shalev’s taxonomy [3]. The &lt;strong&gt;Theme Weaver&lt;/strong&gt; dresses the mechanics in your theme. The &lt;strong&gt;Component Designer&lt;/strong&gt; specifies every physical piece with concrete counts. The &lt;strong&gt;Detail Expander&lt;/strong&gt; writes out turn phases, card examples, and scoring rules. The &lt;strong&gt;Balance Critic&lt;/strong&gt; flags potential issues in the numbers. The &lt;strong&gt;Design Evaluator&lt;/strong&gt; scores six dimensions of design quality, maps the emotional arc across game phases, and measures how original your mechanism combination is. You get a creative profile, not just a single rating.&lt;/p&gt;

&lt;h3 id=&quot;grounded-in-real-games&quot;&gt;Grounded in Real Games&lt;/h3&gt;

&lt;p&gt;Generated designs are not invented from thin air. GameGrammar draws on a reference library of 2000+ existing games from BoardGameGeek [6]. When generating a worker placement game, the system looks at successful worker placement designs, their component counts, player counts, mechanism pairings, and balance approaches, and uses them as guideposts for your new design. This keeps generated games grounded in patterns that have proven themselves in published, well-regarded games.&lt;/p&gt;

&lt;p&gt;The same reference library serves a second purpose: &lt;strong&gt;measuring originality&lt;/strong&gt;. After generation, your design’s mechanism combination is compared against the full collection of existing games. This produces a novelty score. “Your combination of worker placement plus real-time resolution appears in only 3% of reference games.” It is consistent, explainable, and genuinely rewarding when you discover you are exploring uncharted design territory.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;three-layers-game-depth-and-process&quot;&gt;Three Layers: Game, Depth, and Process&lt;/h2&gt;

&lt;p&gt;Your game design in GameGrammar lives on three separate layers, and keeping them separate turned out to be one of the more useful design decisions we made. This separation is not just convenient engineering. It follows directly from the distinction between abstract patterns and concrete instances described earlier. The game itself (Layer 1) holds the patterns. The details (Layer 2) hold the concrete instances. The process (Layer 3) holds the history of how one became the other.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Three_Layer_Architecture.jpg&quot; alt=&quot;The 3-Layer Architecture&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Three layers keep generation clean, editing informed, and your design’s evolution visible. Most design tools mix the game with the process of making it. GameGrammar separates them.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1, Your Game&lt;/strong&gt;, captures the building blocks of any tabletop game: mechanisms, components, turn structure, player dynamics, scoring. These are the abstract patterns, the eternal building blocks that exist across all games, whether you are designing a quick party game or a sprawling civilization epic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2, The Details&lt;/strong&gt;, is where patterns become &lt;em&gt;this specific game&lt;/em&gt;. Concrete specifics: what actually happens during each turn phase, example card text, scoring formulas with real numbers. This is the difference between “this game has a drafting phase” and “in the drafting phase, each player picks one card from a shared pool of 5, resolving simultaneously.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3, The Design Process&lt;/strong&gt;, captures something traditional design tools ignore entirely. If you have ever maintained a game design in a shared Google doc, with crossed-out rules, margin notes that contradict each other, and version 14 saved as “final_FINAL_v2”… you know the pain. Layer 3 solves it. It tracks which sections need updating after your changes, flags consistency issues before you discover them in playtesting, holds AI suggestions you have saved for later, and maintains a full version history with comparison and rollback. It is the organized contractor’s clipboard that keeps the renovation from becoming chaos.&lt;/p&gt;

&lt;h3 id=&quot;the-house-metaphor&quot;&gt;The House Metaphor&lt;/h3&gt;

&lt;p&gt;Think of it like building a house:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/House_Metaphor.jpg&quot; alt=&quot;The House Metaphor&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Layer 1 is the blueprints. Layer 2 is the interior design. Layer 3 is the contractor’s clipboard.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Layer 1 is the &lt;strong&gt;blueprints&lt;/strong&gt;: foundation and load-bearing walls. You cannot live in them, but you cannot build without them. Layer 2 is the &lt;strong&gt;interior design&lt;/strong&gt;: furniture and paint. You can swap the couch without tearing down walls. Layer 3 is the &lt;strong&gt;contractor’s clipboard&lt;/strong&gt;: the punch list, renovation history, and building permits. A record of the work being done, not the building itself.&lt;/p&gt;

&lt;p&gt;This metaphor also explains what happens when you make a big change. If you move a load-bearing wall (change a core mechanism), the furniture layout becomes outdated: the sofa is now halfway through a wall. GameGrammar tracks these connections automatically.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Staleness_Propagation.jpg&quot; alt=&quot;Staleness Propagation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. When you change the blueprints, the interior needs reviewing. The contractor’s clipboard is never outdated; it just records that renovation is underway.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Say you change your hand limit from seven cards to three. Suddenly your drafting phase description is wrong, your “discard to activate” power needs rebalancing, and your endgame scoring assumes players have cards they can no longer hold. In a traditional design doc, you might not catch these ripple effects for weeks. Layer 3 flags every affected section immediately.&lt;/p&gt;

&lt;p&gt;Here is what happens next. You open GameGrammar and see three sections marked stale: Drafting Phase, Card Powers, and Endgame Scoring. You click Drafting Phase. The AI shows you the problem: “Players draft from a pool of 5, but with a hand limit of 3, they can only keep 3 cards total. The drafting round either ends too early or forces repeated discards.” It proposes two options: reduce the draft pool to 3, or let players draft then discard down. You pick the second option because you want the tension of choosing what to keep. You preview the rewritten section, adjust one sentence, and apply.&lt;/p&gt;

&lt;p&gt;The card powers section updates next, and the AI flags that “discard to activate” now competes directly with your hand limit. It suggests making activation free for one card per turn. You disagree. That tension is the whole point of your game. You dismiss the suggestion and mark the section as reviewed.&lt;/p&gt;

&lt;p&gt;Endgame scoring needs a number change: the bonus for “most cards in hand” no longer makes sense at three cards. You accept the AI’s proposal to replace it with “most sets completed.”&lt;/p&gt;

&lt;p&gt;Three sections, five minutes, done. In your old workflow, you might have caught the drafting issue in your next playtest, the scoring issue two playtests after that, and the card power conflict never.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-co-design-partnership-generation-is-not-design&quot;&gt;The Co-Design Partnership: Generation Is Not Design&lt;/h2&gt;

&lt;p&gt;Here is the most important insight behind GameGrammar: &lt;strong&gt;generating a game is not the same as designing one.&lt;/strong&gt; A generated first draft, no matter how good, is a starting point. The real design happens in the back-and-forth that follows.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Co_Design_Paradigm.jpg&quot; alt=&quot;The Co-Design Paradigm&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The co-design loop: you edit, connected sections update, the AI spots emerging issues, you choose how to refine, changes get versioned. Generation sits at the center, but design lives in the cycle around it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After your first draft is generated, the design enters a continuous refinement cycle. You make changes. Connected sections get flagged for review. The AI spots contradictions and emerging issues. You choose a path forward. The AI proposes solutions with clear reasoning. You preview everything before it touches your design. You accept, modify, or reject. Your evaluation scores update, showing which dimensions improved and which declined. The cycle continues.&lt;/p&gt;

&lt;p&gt;One rule is absolute: &lt;strong&gt;the AI never changes your design without your permission.&lt;/strong&gt; Every modification follows a preview-before-apply pattern. The AI shows you what it wants to change and why. You decide. This is not just a convenience feature. It follows from the theory’s core claim: generation is not design. If the AI could just change things, it would be designing. The preview step is what keeps the human in the designer’s chair.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;AI proposes. You decide.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of it this way: the AI is a mechanic, not a mystic. It can tell you that your engine has a misfire, that your cooperative game has conflicting competitive scoring. But it cannot tell you whether the rumble of that misfire is exactly the tension you intended. It reads the blueprint. You read the room.&lt;/p&gt;

&lt;h3 id=&quot;five-ways-the-ai-can-help&quot;&gt;Five Ways the AI Can Help&lt;/h3&gt;

&lt;p&gt;The partnership offers five distinct modes of assistance, each suited to different moments in your design process:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Four_Modes_AI_Assistance.jpg&quot; alt=&quot;Four Modes of AI Assistance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. From precise fixes to open-ended creative guidance. You choose which mode fits your current need.&lt;/em&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Mode&lt;/th&gt;
      &lt;th&gt;What the AI Does&lt;/th&gt;
      &lt;th&gt;Your Role&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Rewrite Section&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Rewrites one outdated section using your current full design as context&lt;/td&gt;
      &lt;td&gt;Review and accept or reject&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Fix Contradiction&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Proposes a targeted fix for a specific inconsistency&lt;/td&gt;
      &lt;td&gt;Preview the change, confirm or skip&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Smart Edit&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Translates your plain-language instruction into concrete design changes&lt;/td&gt;
      &lt;td&gt;Review what it changed, pick which parts to keep&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Smart Suggestions&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Generates 3-7 improvement ideas with priorities and reasoning&lt;/td&gt;
      &lt;td&gt;Apply now, save for later, or dismiss&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Re-Evaluate&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Scores 6 design dimensions, maps the emotional arc, measures originality&lt;/td&gt;
      &lt;td&gt;Study the profile, find weak spots, decide what to improve&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Re-evaluation is especially powerful because it closes the feedback loop. After making changes through any of the other modes, you can re-score your design and see what moved. Dimensions that improved flash green; those that declined flash amber. This turns the cycle from “edit and hope” into “edit, measure, and iterate with direction.”&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;your-design-workbench-four-stages-of-creation&quot;&gt;Your Design Workbench: Four Stages of Creation&lt;/h2&gt;

&lt;p&gt;A game design in GameGrammar moves through four recognizable stages, each with its own rhythm and its own set of AI tools:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Design_Workbench_Stages.jpg&quot; alt=&quot;The Design Workbench: Stages of Creation&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Four stages from first draft to final polish. The workbench is fluid: you move between stages as the design evolves.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1, Genesis.&lt;/strong&gt; You provide a theme, some constraints (player count, complexity, play time), and optionally pick a few mechanisms you want to include. The six-agent pipeline generates your complete first draft. You receive a full game design with a creative profile: six scored design dimensions displayed as a radar chart, an emotional arc across game phases, a theme cohesion score, and an originality percentile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2, Shaping.&lt;/strong&gt; You start editing. Swapping mechanisms, adjusting numbers, reworking the turn structure. Each change flags connected sections that may need updating. The AI helps with change tracking, section rewrites, and consistency checking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3, Refinement.&lt;/strong&gt; The big decisions are made. Now you fine-tune: adjusting balance, clarifying card effects, tightening scoring. AI assistance shifts to targeted fixes, plain-language editing (“make the hand limit 7 instead of 5”), and proactive suggestions for improvements you might not have considered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 4, Polish.&lt;/strong&gt; Your game works. Now you focus on coherence, theme integration, and overall quality. The Creative Coach dashboard gives you the complete picture at a glance: your radar chart, originality score, theme cohesion, and emotional arc.&lt;/p&gt;

&lt;p&gt;These stages are not a rigid sequence. You might be fine-tuning in Stage 3 when a mechanism swap sends you back to Stage 2. You might spot a weak dimension on your radar chart in Stage 4 that sparks a whole new direction. The workbench moves with you.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;design-health-from-bug-checker-to-creative-coach&quot;&gt;Design Health: From Bug Checker to Creative Coach&lt;/h2&gt;

&lt;p&gt;Early versions of GameGrammar’s health system only measured what was &lt;em&gt;wrong&lt;/em&gt; with a design: contradictions, outdated sections, unresolved suggestions. Those checks are useful, but they answer the wrong question. A game with zero contradictions is &lt;em&gt;functional&lt;/em&gt;. It is not necessarily &lt;em&gt;good&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Design_Health_Two_Levels.jpg&quot; alt=&quot;Design Health: Two Levels&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Level 1 is the clipboard: structural soundness, the checklist that confirms nothing is broken. Level 2 is the radar chart: creative vitality, the profile that shows what makes your design special. The shift is from “What is wrong?” to “What makes this special?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The deeper insight: as a designer, you need to know not just whether the blueprint is structurally sound, but whether the game has &lt;em&gt;soul&lt;/em&gt;. We needed to move from “nothing is broken” to “here is what makes this design special.”&lt;/p&gt;

&lt;p&gt;GameGrammar now tracks your design on two levels:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1, Structural Soundness&lt;/strong&gt; (the floor, not the ceiling): consistency score, outdated section count, unresolved suggestions. Necessary housekeeping, but not what gets you excited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2, Creative Vitality&lt;/strong&gt; (what you actually care about):&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;What It Tells You&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Six Design Dimensions&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;How your game scores on strategic depth, tension, player agency, replayability, social interaction, and elegance, displayed as a radar chart&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Originality Score&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;How unique your mechanism combination is compared to 2000+ existing games (0-100 percentile)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Theme Cohesion&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;How well your theme, mechanics, and components hold together as a unified experience (1-10)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Engagement Curve&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;The emotional peaks and valleys players experience across your game’s phases&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The six dimensions turn the vague question “is this fun?” into actionable creative directions. Instead of a single number, you see &lt;em&gt;where&lt;/em&gt; your design is strong and &lt;em&gt;where&lt;/em&gt; it could grow. Strategic depth might be a 9 while social interaction sits at a 4, and that is perfectly fine if you are designing a solo puzzle game.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Originality Score&lt;/strong&gt; gives you a creative thrill by showing how your mechanism combination compares to published games. A score of 87 means “only 13% of existing games share this combination.” It rewards you for venturing into unexplored design space.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Engagement_Curve.jpg&quot; alt=&quot;The Engagement Curve&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Mapping the soul of the game. The dramatic curve spikes toward the end. The flowing curve builds steadily. Neither is wrong. The curve shows what your game does emotionally, not what it should do.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Engagement Curve&lt;/strong&gt; maps the emotional arc of your game across its phases. A flat line suggests a one-note experience; peaks and valleys suggest drama. A horror game &lt;em&gt;should&lt;/em&gt; spike. A meditative engine-builder &lt;em&gt;should&lt;/em&gt; flow smoothly. But a flat curve is not automatically a problem. Some of the best engine-building games have a slow, meditative build by design. The curve shows you what your game does emotionally. Whether that matches your intention is your call, not the AI’s.&lt;/p&gt;

&lt;p&gt;These metrics are not report cards. A low social interaction score is not a failing grade. Your game might not need social interaction. The dashboard answers &lt;em&gt;“What makes this design special?”&lt;/em&gt; not &lt;em&gt;“What is wrong?”&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;can-ai-understand-fun&quot;&gt;Can AI Understand “Fun”?&lt;/h2&gt;

&lt;p&gt;This is the question every designer asks, and it deserves a straight answer.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Can_AI_Understand_Fun.jpg&quot; alt=&quot;Can AI Understand Fun?&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. AI cannot feel fun, but it can recognize the patterns that cause it. Hidden info creates tension. Multiple options create agency. Escalating stakes create drama.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The short answer: no, AI cannot &lt;em&gt;experience&lt;/em&gt; fun. It has never felt the excitement of a close finish, the satisfaction of a clever combo, or the social electricity of pulling off a bluff. It has no taste. It has no feelings.&lt;/p&gt;

&lt;p&gt;But here is the thing: you do not need to &lt;em&gt;feel&lt;/em&gt; fun to &lt;em&gt;recognize the design patterns that produce it&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Fun in board games is not random magic. It comes from choices you make as a designer:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;What You Design&lt;/th&gt;
      &lt;th&gt;What Players Feel&lt;/th&gt;
      &lt;th&gt;Can AI Detect It?&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Multiple real options each turn&lt;/td&gt;
      &lt;td&gt;Agency, meaningful choice&lt;/td&gt;
      &lt;td&gt;Yes, from your turn structure&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hidden information between players&lt;/td&gt;
      &lt;td&gt;Tension, suspense&lt;/td&gt;
      &lt;td&gt;Yes, from your mechanism choices&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Escalating stakes toward the endgame&lt;/td&gt;
      &lt;td&gt;Drama, narrative arc&lt;/td&gt;
      &lt;td&gt;Yes, as the engagement curve&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Few rules that create many strategies&lt;/td&gt;
      &lt;td&gt;Elegance, “easy to learn, hard to master”&lt;/td&gt;
      &lt;td&gt;Yes, by comparing rule count to strategic depth&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Actions that directly affect other players&lt;/td&gt;
      &lt;td&gt;Social interaction, negotiation&lt;/td&gt;
      &lt;td&gt;Yes, from mechanism analysis&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Unusual mechanism combinations&lt;/td&gt;
      &lt;td&gt;Novelty, surprise&lt;/td&gt;
      &lt;td&gt;Yes, by comparison to existing games&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;A bridge engineer does not need to “feel beauty” to know the math that makes a bridge elegant. A music teacher can explain why a chord progression creates tension and release without weeping every time they hear it. Analysis and experience operate on different levels.&lt;/p&gt;

&lt;p&gt;But here is what the analogy misses: a bridge has one purpose. A board game has as many purposes as it has players. The same mechanic that creates delicious tension for one group might fall flat for another. The AI can analyze the structural ingredients of fun (meaningful choices, hidden information, escalating stakes) but it cannot predict the alchemy that happens when four friends sit down together on a Friday night. That alchemy is yours.&lt;/p&gt;

&lt;p&gt;There are parts of fun that stay irreducibly personal: the chemistry of your play group, the cultural resonance of a theme, the satisfying weight of wooden tokens in your hand, the occasional magic that defies explanation. AI cannot evaluate those. This is exactly why the ground rule exists: &lt;strong&gt;AI proposes, you decide.&lt;/strong&gt; The AI provides structural analysis. You decide whether it matters for &lt;em&gt;this&lt;/em&gt; game and &lt;em&gt;this&lt;/em&gt; group of players.&lt;/p&gt;

&lt;p&gt;The useful question is not “Can AI understand fun?” but “Can AI spot the design patterns that tend to produce fun, so you can focus your energy on the parts only a human designer can provide?” The answer to the first is no. The answer to the second is yes, and that is the only question that matters for a design tool.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;is-ai-creative&quot;&gt;Is AI “Creative”?&lt;/h2&gt;

&lt;p&gt;A related question: Can AI be genuinely creative, or does it just shuffle existing ideas around? This question is about &lt;em&gt;generation&lt;/em&gt;, not evaluation, and it has a surprisingly good answer.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/Is_AI_Creative.jpg&quot; alt=&quot;Is AI Creative?&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Existing elements become novel structures through synthesis. The human provides intention (the why). The system provides recombination (the how).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The philosopher Whitehead defined creativity not as making something from nothing, but as &lt;em&gt;the novel combination of existing elements&lt;/em&gt; [1]. Every creative act borrows from what came before but achieves something new in how it brings those pieces together. A poet inherits language, meter, and tradition; the poem is new. An architect inherits materials, engineering, and precedent; the building is new. Creativity is not the absence of inheritance. It is the &lt;em&gt;fresh synthesis&lt;/em&gt; of what you have inherited.&lt;/p&gt;

&lt;p&gt;This is exactly what GameGrammar does. The vocabulary provides the inherited building blocks of game design: worker placement, deck building, area control, set collection, hidden bidding. These patterns have been discovered, tested, and refined across thousands of published games. The AI generates a specific new game that combines those building blocks in a configuration that did not previously exist.&lt;/p&gt;

&lt;p&gt;The criticism that “AI just recombines” is fair, but consider: much of human game design works the same way. Catan combined resource trading with hex grids and dice production: three familiar patterns, one landmark game. Dominion combined deck building with card drafting. Root combined area control with asymmetric player powers. A great deal of creativity in game design comes from the novel recombination of known mechanisms [3]. The difference is that human designers bring taste, experience, and intention to the process. GameGrammar makes the recombination step visible and fast, so you can spend more time on the parts that require your judgment.&lt;/p&gt;

&lt;p&gt;What AI genuinely lacks is &lt;em&gt;intention&lt;/em&gt;. When you say “I want to create the feeling of surviving in a harsh wilderness,” that vision, that lived experience compressed into a creative direction, is yours alone. The AI has no memories, no desires, no reasons to create &lt;em&gt;this&lt;/em&gt; game rather than &lt;em&gt;that&lt;/em&gt; one. This is why the process separates your vision (human) from generation (AI) from refinement (collaborative). You provide the “why.” The AI provides the “how.” The design that emerges from working together is genuinely new.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;a-new-way-of-working&quot;&gt;A New Way of Working&lt;/h2&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/New_Mental_Model.jpg&quot; alt=&quot;A New Mental Model for Game Design&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Design as structured creative partnership. Not a blank page, and not one-click magic either.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GameGrammar suggests that game design works best as a three-part activity:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Your Vision&lt;/strong&gt;: choosing the experience you want to create, the theme, player count, complexity, who will play it. This is entirely yours.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;AI Generation&lt;/strong&gt;: translating that vision into a structured first draft, grounded in real game design knowledge. This is where AI does its best work.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Collaborative Refinement&lt;/strong&gt;: shaping the generated design through your judgment, assisted by AI analysis. This is where the partnership shines.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is different from both traditional design (entirely solo, often unstructured, reliant on personal experience) and naive AI generation (typing “make me a board game” and hoping for the best). It positions design as a &lt;em&gt;creative partnership&lt;/em&gt; where your vision and AI capability each contribute what they do best.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/gamegrammar-the-theory-of-generative-board-game-design/What_Designers_Gain.jpg&quot; alt=&quot;What Human Designers Gain&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The blank page becomes a structured first draft. Manual tracking becomes automatic change detection. Gut feeling becomes measurable creative vitality.&lt;/em&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;What You Need&lt;/th&gt;
      &lt;th&gt;Without GameGrammar&lt;/th&gt;
      &lt;th&gt;With GameGrammar&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Starting a design&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Blank page, overwhelming options&lt;/td&gt;
      &lt;td&gt;Structured first draft from your theme and constraints&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Making sure nothing is missing&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Mental checklist, easy to overlook&lt;/td&gt;
      &lt;td&gt;Every required element guaranteed present&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Catching contradictions early&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Often surfaces during playtesting&lt;/td&gt;
      &lt;td&gt;AI flags some issues before you reach the table&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Updating connected sections&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Keeping it all in your head&lt;/td&gt;
      &lt;td&gt;Automatic change tracking across your design&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Exploring alternatives&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Designing variants by hand&lt;/td&gt;
      &lt;td&gt;AI suggestions and plain-language editing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Measuring quality&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;“I think it feels good?”&lt;/td&gt;
      &lt;td&gt;Six scored dimensions, originality percentile, engagement curve&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Tracking versions&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;File copies, or nothing at all&lt;/td&gt;
      &lt;td&gt;Full version history with comparison and rollback&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;GameGrammar does not claim that AI can replace you. The theory puts the designer firmly in the author’s chair and the AI in the assistant’s seat. The AI can spot that your cooperative game has contradictory competitive elements. It cannot know whether you put that tension there on purpose. The value is in making the &lt;em&gt;structural&lt;/em&gt; side of design faster and more visible, so you can pour your energy into the &lt;em&gt;creative&lt;/em&gt; decisions that only a human designer can make.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;GameGrammar’s design theory can be stated simply:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Games have shared structure. That structure makes generation possible. But generation is not design. Design is the back-and-forth refinement of a generated starting point, where AI proposes and the designer decides.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No single piece of this is revolutionary on its own. Structured game vocabularies exist. AI generation exists. Iterative design is as old as design itself. The contribution is the &lt;em&gt;synthesis&lt;/em&gt;: a unified workbench where structured vocabulary, AI generation, change tracking, consistency checking, proactive suggestions, six-dimensional evaluation, and creative vitality metrics all work together as one coherent system.&lt;/p&gt;

&lt;p&gt;The three-layer model keeps your game separate from the process of making it. The preview-before-apply rule keeps you in control. The Creative Coach dashboard answers not “what is broken?” but “what makes this design special?” And the four-stage workbench, from genesis through shaping, refinement, and polish, gives you a natural rhythm for the work.&lt;/p&gt;

&lt;p&gt;For the design community, this offers a new way of working: not AI replacing designers, not designers ignoring AI, but a structured partnership where each contributes what they do best. The grammar constrains. The AI creates within those constraints. You decide what the game should be.&lt;/p&gt;

&lt;p&gt;The platform handles the mechanics. You bring the meaning. That division of labor is the whole theory in one sentence.&lt;/p&gt;

&lt;p&gt;The grammar does not write the poem. But without grammar, there is no poem to write.&lt;/p&gt;

&lt;p&gt;GameGrammar is available in public beta at &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;gamegrammar.dynamindresearch.com&lt;/a&gt;. Already have a game in progress? You do not need to start from generation. Describe your existing design to Nova, and the platform will structure it for balance analysis and strategy testing — starting from the work you have already done. Try it. Generate a game from your favorite theme. Study the radar chart. Make some changes and hit Re-Evaluate. Watch the scores move. Then decide for yourself whether this partnership is worth having.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;series&quot;&gt;Series&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Unlocking the Secrets of Tabletop Games Ontology (Part 4)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Introducing GameGrammar: AI-Powered Board Game Design (Part 5)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;»&lt;/strong&gt; &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;GameGrammar: The Theory of Generative Board Game Design (Part 6)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Alfred North Whitehead. &lt;a href=&quot;https://archive.org/details/processreality00alfr&quot;&gt;&lt;em&gt;Process and Reality&lt;/em&gt;&lt;/a&gt;. Free Press, 1929/1978.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The philosophical foundation for how abstract design patterns become concrete games&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] Benny Cheung. &lt;a href=&quot;process-philosophy-for-ai-agent-design&quot;&gt;&lt;em&gt;Process Philosophy for AI Agent Design&lt;/em&gt;&lt;/a&gt;. bennycheung.github.io, Jan 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A deeper exploration of how process philosophy connects to AI creativity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Geoffrey Engelstein, Isaac Shalev. &lt;a href=&quot;https://www.routledge.com/Building-Blocks-of-Tabletop-Game-Design-An-Encyclopedia-of-Mechanisms/Engelstein-Shalev/p/book/9781138365490&quot;&gt;&lt;em&gt;Building Blocks of Tabletop Game Design: An Encyclopedia of Mechanisms&lt;/em&gt;&lt;/a&gt;. CRC Press, 2022.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The definitive taxonomy of game mechanisms, and the source for GameGrammar’s mechanism library&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] Natalya F. Noy, Deborah L. McGuinness. &lt;a href=&quot;https://protege.stanford.edu/publications/ontology_development/ontology101.pdf&quot;&gt;&lt;em&gt;Ontology Development 101&lt;/em&gt;&lt;/a&gt;. Stanford University.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;How to build structured vocabularies for complex domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] Benny Cheung. &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;&lt;em&gt;Unlocking the Secrets of Tabletop Games Ontology&lt;/em&gt;&lt;/a&gt;. bennycheung.github.io, Feb 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Where the structured game vocabulary began, analyzing 2,000+ published games&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[6] &lt;a href=&quot;https://boardgamegeek.com/&quot;&gt;BoardGameGeek&lt;/a&gt;. The largest board game database and community.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Source for the 2,000+ game reference library used in generation and originality scoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[7] &lt;a href=&quot;https://www.dynamindresearch.com&quot;&gt;Dynamind Research&lt;/a&gt;. Research and product development studio.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Creator of GameGrammar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[8] Benny Cheung. &lt;a href=&quot;https://arxiv.org/abs/2602.05636&quot;&gt;&lt;em&gt;Generative Ontology: When Structured Knowledge Learns to Create&lt;/em&gt;&lt;/a&gt;. arXiv:2602.05636, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The formal paper describing GameGrammar’s generative ontology framework, six-agent pipeline, and evaluation methodology&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Fri, 06 Feb 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/gamegrammar-the-theory-of-generative-board-game-design</link>
        <guid isPermaLink="true">https://bennycheung.github.io/gamegrammar-the-theory-of-generative-board-game-design</guid>
        
        <category>Game Design</category>
        
        <category>Tabletop Games</category>
        
        <category>Design Tools</category>
        
        <category>Ontology</category>
        
        <category>Process Philosophy</category>
        
        <category>Co-Design</category>
        
        <category>Design Theory</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Introducing GameGrammar: AI-Powered Board Game Design</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;I typed twelve words into a text box and got back a structured first draft of a board game. Mechanics, components, scoring tables, a hex map, a four-phase turn structure, and a critic that told me the game was broken. The whole thing took 73 seconds. This is what happens when you give a structured game taxonomy to six specialized design agents and let them critique your design.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_Landing_Hero.png&quot; alt=&quot;GameGrammar Landing Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. GameGrammar transforms a theme and constraints into a structured board game design through six specialized design agents.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The twelve words were: &lt;em&gt;“Rival astronomers racing to name celestial objects before their competitors claim the glory.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What came back was &lt;strong&gt;Stellar Rivals: A Race to the Stars&lt;/strong&gt;, a 2-4 player competitive game about 19th-century astronomers exploring a hex grid of stellar sectors, collecting celestial objects, and racing to complete constellations. It had specific action point costs, a scoring table with five distinct paths to victory, equipment upgrade cards, and a balance critique that flagged two high-severity issues I would have needed weeks of playtesting to discover.&lt;/p&gt;

&lt;p&gt;I did not design this game. I did not prompt-engineer it into existence through twenty rounds of back-and-forth with ChatGPT. I typed a theme, set some constraints, and hit Generate.&lt;/p&gt;

&lt;div class=&quot;video-responsive&quot;&gt;
  &lt;iframe src=&quot;https://www.youtube.com/embed/iW6g_1mnyMQ&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;This article is the story of how that works, why it is different from asking an LLM to “design a board game,” and what it means for the future of game design. This is also Part 5 of the &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Game Architecture series&lt;/a&gt;, where we have been building toward this moment since we first mapped the structure of tabletop games in &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Part 4&lt;/a&gt; [1]. If you want to understand the design theory behind how GameGrammar works, including its philosophical foundations and the co-design relationship between human designers and AI, see &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;Part 6: The Theory of Generative Board Game Design&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-playtesting-graveyard&quot;&gt;The Playtesting Graveyard&lt;/h2&gt;

&lt;p&gt;Before we look at what GameGrammar produces, we need to understand the problem it solves. Because the problem is not “I want AI to design games for me.” The problem is the blank page.&lt;/p&gt;

&lt;p&gt;Creating a published board game is a nine-stage journey [2], and most people only see the last few stages: the box on the shelf, the Kickstarter campaign, the review video. What they do not see is Stages 1 and 2, where designers actually spend most of their time.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Stage&lt;/th&gt;
      &lt;th&gt;Phase&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Concept &amp;amp; Early Design&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Core idea, initial mechanics, paper prototype&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Iterative Playtesting&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Cut, merge, rewrite rules; stress-test systems&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3-9&lt;/td&gt;
      &lt;td&gt;Design Lock through Post-Launch&lt;/td&gt;
      &lt;td&gt;Development, art, manufacturing, marketing, distribution, support&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Six more stages stand between a playtested prototype and a box on a shelf: publisher development, art direction, manufacturing, marketing, distribution, and post-launch support. But the graveyard is in Stages 1 and 2.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Most ‘cool mechanics’ die here, and should.” [2]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The playtesting graveyard is well-populated. A designer spends months developing a resource-trading mechanic, runs a blind playtest, and discovers it creates a dominant strategy. The mechanic gets cut. The designer starts over. This cycle is essential, but it is also where motivation erodes, especially for solo designers without a team to sustain momentum.&lt;/p&gt;

&lt;p&gt;The blank page is the first enemy. Before a designer can even begin the playtesting gauntlet, they need a concept worth testing. Not just a theme, but a coherent combination of mechanics, components, player dynamics, and victory conditions that might, with sufficient iteration, become a real game.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Board_Game_Design_Pipeline.png&quot; alt=&quot;Board Game Design Pipeline&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The nine-stage board game production pipeline. GameGrammar accelerates Stages 1 and 2, where designers spend the most time and where promising ideas most often die.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I built GameGrammar at &lt;a href=&quot;https://www.dynamindresearch.com&quot;&gt;Dynamind Research&lt;/a&gt; [5], a research and product studio that bridges computational design research with practical implementation, to attack this specific problem. Not to replace game designers, not to automate the creative process, but to eliminate blank-page paralysis and give designers structured starting points worth iterating on. It operates at Stages 1 and 2, where the designer’s challenge is generating enough viable concepts to find the gem worth polishing.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;stellar-rivals-watching-six-agents-build-a-game&quot;&gt;Stellar Rivals: Watching Six Agents Build a Game&lt;/h2&gt;

&lt;p&gt;Here is what it actually looks like. You open GameGrammar, type a theme, and set some constraints:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_Generate_Step1.png&quot; alt=&quot;GameGrammar Generate Step 1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The input: a theme, constraints, and optionally pre-selected mechanisms. Or just type a sentence and let the system choose.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The theme can be anything: &lt;em&gt;“Medieval merchants trading spices along the Silk Road,”&lt;/em&gt; &lt;em&gt;“Deep sea explorers discovering lost civilizations,”&lt;/em&gt; or in our case, &lt;em&gt;“Rival astronomers racing to name celestial objects”&lt;/em&gt; with constraints &lt;em&gt;“2-4 players, competitive, medium complexity, 45-60 minutes.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then you choose a generation mode. I picked Multi-Agent because it reveals what makes GameGrammar fundamentally different from a single-prompt approach: six specialized agents working in sequence, each reading the complete output of every agent before it.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Mode&lt;/th&gt;
      &lt;th&gt;Speed&lt;/th&gt;
      &lt;th&gt;What Happens&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Quick&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;~15 seconds&lt;/td&gt;
      &lt;td&gt;A single powerful model generates the complete design in one pass&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Multi-Agent&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;45-90 seconds&lt;/td&gt;
      &lt;td&gt;Six specialized agents collaborate sequentially, each building on the last&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;RAG-Enhanced&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;~30 seconds&lt;/td&gt;
      &lt;td&gt;Generation grounded in data from 2,000+ published BoardGameGeek games [4]&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_Generate_Page.png&quot; alt=&quot;GameGrammar Generate Page&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Choose a generation mode: Quick for rapid prototyping, Multi-Agent for the full six-agent pipeline, RAG-Enhanced for designs grounded in real published games.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Seventy-three seconds later:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Title.png&quot; alt=&quot;Stellar Rivals Title&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Stellar Rivals: A Race to the Stars. A structured first draft from twelve words and four constraints.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Overview.png&quot; alt=&quot;Stellar Rivals Overview&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. An Overview of the Expedition: 2-4 players, 45-60 minutes, medium complexity. Every discovery attaches your name to a celestial object.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What comes back is not a paragraph of suggestions. It is a structured design document with concrete component counts, specific point values, defined turn phases, and identified balance concerns. The kind of document you can hand to a playtester and say, “Let’s try this.”&lt;/p&gt;

&lt;p&gt;Here is what each agent did, and why the sequence matters.&lt;/p&gt;

&lt;h3 id=&quot;the-mechanicsarchitect-picks-the-bones&quot;&gt;The MechanicsArchitect Picks the Bones&lt;/h3&gt;

&lt;p&gt;The first agent does not think about themes or components. It thinks about mechanisms. It has access to a curated taxonomy of 35 game mechanisms organized into seven categories [1][3], and its job is to select a compatible set.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_MultiAgent_Pipeline.png&quot; alt=&quot;GameGrammar Multi-Agent Pipeline&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Six specialized agents working in sequence. Each agent sees and builds on the complete output of all previous agents.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For Stellar Rivals, it chose four:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Action Points&lt;/strong&gt; (6 per turn) for the core decision economy&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Set Collection&lt;/strong&gt; for constellation-based scoring&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Hidden Information&lt;/strong&gt; via face-down celestial object tiles&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Area Movement&lt;/strong&gt; across a hex grid of stellar sectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why these four? Because the MechanicsArchitect checks a compatibility matrix built from co-occurrence patterns in real published games. Action Points and Set Collection appear together frequently because scarcity-based economies pair well with collection goals. Hidden Information and Area Movement create spatial discovery, which is exactly what “rival astronomers” implies. The agent is not guessing. It is pattern-matching against thousands of published designs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_Mechanism_Browser.png&quot; alt=&quot;GameGrammar Mechanism Browser&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The Mechanism Browser: 35 mechanisms in seven categories (Turn Order, Action Selection, Resource/Economy, Conflict/Territory, Cards/Deck, Information, Other), with compatibility data drawn from real published games.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This taxonomy is not arbitrary. It is derived from the analysis of thousands of published tabletop games and established game design literature [1][3]. The MechanicsArchitect selects from this curated vocabulary, not from the fuzzy patterns in an LLM’s training data.&lt;/p&gt;

&lt;h3 id=&quot;the-themeweaver-dresses-the-skeleton&quot;&gt;The ThemeWeaver Dresses the Skeleton&lt;/h3&gt;

&lt;p&gt;Agent two reads the mechanism selections and translates them into a 19th-century astronomical discovery setting:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Six stellar sectors (Inner Planets, Asteroid Belt, Outer Planets, Nebula Field, Deep Space, Galactic Core)&lt;/li&gt;
  &lt;li&gt;Celestial objects including quasars, galaxies, binary stars, and nebulae&lt;/li&gt;
  &lt;li&gt;Weather conditions that modify observation costs&lt;/li&gt;
  &lt;li&gt;Research tokens representing scientific progress&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where “Area Movement” becomes “moving your telescope between stellar sectors” and “Hidden Information” becomes “face-down celestial object tiles you reveal by observing.” The mechanisms have not changed. They have been given a skin.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Hex_Map.png&quot; alt=&quot;Stellar Rivals Hex Map&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Mapping the Celestial Sphere: six hex sectors with movement costs in Action Points. Distance equals cost.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-componentdesigner-makes-it-physical&quot;&gt;The ComponentDesigner Makes It Physical&lt;/h3&gt;

&lt;p&gt;Agent three specifies every physical piece:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;6 hex sector tiles&lt;/li&gt;
  &lt;li&gt;48 face-down celestial object tiles (8 per sector)&lt;/li&gt;
  &lt;li&gt;Equipment upgrade cards&lt;/li&gt;
  &lt;li&gt;Research tokens and observation point markers&lt;/li&gt;
  &lt;li&gt;Player telescope markers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because a game that exists as a concept is not the same as a game that can be prototyped. Specific counts, specific card types, specific token functions. A designer can read this list and start cutting cardboard.&lt;/p&gt;

&lt;h3 id=&quot;the-detailsarchitect-writes-the-rules&quot;&gt;The DetailsArchitect Writes the Rules&lt;/h3&gt;

&lt;p&gt;Agent four defines the turn structure as a four-phase loop:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Phase&lt;/th&gt;
      &lt;th&gt;Activity&lt;/th&gt;
      &lt;th&gt;Key Decision&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Telescope Positioning&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Simultaneously choose a sector&lt;/td&gt;
      &lt;td&gt;Where to observe this turn&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Observation&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Spend 6 action points on movement, revealing, and claiming&lt;/td&gt;
      &lt;td&gt;Resource allocation under scarcity&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Analysis&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Score claimed objects, check constellation completion&lt;/td&gt;
      &lt;td&gt;Set collection progress&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Equipment&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Spend research tokens on upgrades&lt;/td&gt;
      &lt;td&gt;Engine building investments&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Turn_Phases.png&quot; alt=&quot;Stellar Rivals Turn Phases&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The Astronomer’s Routine: a four-phase loop of Positioning, Observation, Analysis, and Equipment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is where the design starts to feel like a real game. Consider Dr. Sarah Chen’s turn:&lt;/p&gt;

&lt;p&gt;She spends 3 of her 6 observation points to move her telescope to the Galactic Core sector. She uses a Spectrographic Filter equipment card to reveal a tile for a reduced cost of 1 point, uncovering a Quasar. She spends her final 2 points to reveal an Elliptical Galaxy. The Quasar is worth more raw points, but the Galaxy completes her Andromeda constellation, netting her 4 points for the object plus an 8-point constellation bonus. She earns research tokens and later acquires the Observatory Network card.&lt;/p&gt;

&lt;p&gt;That is a real decision with real trade-offs. Points now versus points later. Individual value versus set completion. Spend on movement or spend on discovery. These are the kinds of tensions that make games interesting, and they emerged from twelve words of input.&lt;/p&gt;

&lt;p&gt;The scoring system has five distinct paths:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Scoring Category&lt;/th&gt;
      &lt;th&gt;Mechanism&lt;/th&gt;
      &lt;th&gt;Points&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Celestial Objects&lt;/td&gt;
      &lt;td&gt;Per-object claim&lt;/td&gt;
      &lt;td&gt;1-6 based on rarity&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Constellation Bonuses&lt;/td&gt;
      &lt;td&gt;Set completion&lt;/td&gt;
      &lt;td&gt;3 / 5 / 8 (escalating)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Binary Systems&lt;/td&gt;
      &lt;td&gt;Pair discovery&lt;/td&gt;
      &lt;td&gt;2x printed value&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Naming Rights&lt;/td&gt;
      &lt;td&gt;First discovery bonus&lt;/td&gt;
      &lt;td&gt;3-4 points&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Galaxy Clusters&lt;/td&gt;
      &lt;td&gt;End-game collection&lt;/td&gt;
      &lt;td&gt;+2 per galaxy (if 4+)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Scoring.png&quot; alt=&quot;Stellar Rivals Scoring&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Five strategic paths to victory: object rarity, constellation sets, binary pairs, naming rights, and galaxy clusters.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;your-game-is-broken&quot;&gt;“Your Game Is Broken”&lt;/h2&gt;

&lt;p&gt;This is the part that surprised me when I first built it.&lt;/p&gt;

&lt;p&gt;Agents five and six are not designers. They are critics. The &lt;strong&gt;BalanceCritic&lt;/strong&gt; evaluates the complete design for fairness and strategic depth. The &lt;strong&gt;FunFactorJudge&lt;/strong&gt; assesses whether the game would actually be fun to play. And they do not pull punches.&lt;/p&gt;

&lt;p&gt;Here is what the BalanceCritic said about Stellar Rivals:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Issue&lt;/th&gt;
      &lt;th&gt;Severity&lt;/th&gt;
      &lt;th&gt;Recommendation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Observatory Network card circumvents movement economy&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Limit uses per game or increase cost to 2-3 extra points&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Constellation difficulty gap, easy sets too rewarding&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;High&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Adjust bonus structure (e.g., 3-6-10) to reflect difficulty&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Sector overcrowding with 4 players&lt;/td&gt;
      &lt;td&gt;Medium&lt;/td&gt;
      &lt;td&gt;Scale tiles per sector with player count&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Starting 6 observation points feels restrictive&lt;/td&gt;
      &lt;td&gt;Medium&lt;/td&gt;
      &lt;td&gt;Increase to 7-8 or allow 1-2 point carryover&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Research token conversion rate (2:1) is inefficient&lt;/td&gt;
      &lt;td&gt;Medium&lt;/td&gt;
      &lt;td&gt;Improve to 3:2 for tactical viability&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Two high-severity issues. The Observatory Network card, which Dr. Sarah Chen acquired in our play example, actually breaks the movement economy that makes the rest of the game work. And the constellation bonus structure rewards easy sets disproportionately, creating a dominant strategy.&lt;/p&gt;

&lt;p&gt;The FunFactorJudge rated the design 7/10, identifying the “thrill of discovery” from hidden tiles and the tight action-point economy as primary tension sources.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/Stellar_Rivals_Balance_Critique.png&quot; alt=&quot;Stellar Rivals Balance Critique&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The BalanceCritic identifies strengths alongside critical issues with specific fix recommendations. This is not flattery. It is a design review.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Think about what just happened. The system generated a game, then told me it was broken, then told me exactly how to fix it. That kind of feedback normally requires weeks of playtesting, multiple playtest groups, and a designer honest enough to see their own blind spots. Here, it happened in the same 73-second generation pass.&lt;/p&gt;

&lt;p&gt;GameGrammar does not just generate. It argues with itself about the quality of its own output.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;cant-i-just-use-chatgpt&quot;&gt;“Can’t I Just Use ChatGPT?”&lt;/h2&gt;

&lt;p&gt;Fair question. You can absolutely ask ChatGPT to design a board game. It will produce fluent, confident text. It will tell you about a game with “resource management” and “strategic depth” and “high replayability.”&lt;/p&gt;

&lt;p&gt;But try handing that output to a playtester. Ask them: how many action points do I get per turn? What happens when two players land on the same tile? How many cards are in the starting deck? The answer, almost always, is that the LLM generated the &lt;em&gt;appearance&lt;/em&gt; of a game design without the &lt;em&gt;substance&lt;/em&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Aspect&lt;/th&gt;
      &lt;th&gt;Raw LLM&lt;/th&gt;
      &lt;th&gt;GameGrammar&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Single prompt, single pass&lt;/td&gt;
      &lt;td&gt;6 specialized agents in sequence&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Mechanisms&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Drawn from fuzzy training data&lt;/td&gt;
      &lt;td&gt;35-mechanism curated taxonomy&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Game References&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;May hallucinate titles and stats&lt;/td&gt;
      &lt;td&gt;2,000+ real BGG games indexed&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Output Structure&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Unstructured prose&lt;/td&gt;
      &lt;td&gt;Consistent schema with components, rules, scoring&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Self-Critique&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;None unless explicitly prompted&lt;/td&gt;
      &lt;td&gt;Built-in BalanceCritic and FunFactorJudge&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Iteration&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Copy-paste-re-prompt&lt;/td&gt;
      &lt;td&gt;Workbench with targeted expand, re-evaluate, consistency&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Balance Analysis&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Generic advice&lt;/td&gt;
      &lt;td&gt;Severity-rated issues with specific fix recommendations&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The difference is not marginal. A raw LLM might tell you a game has “resource management.” GameGrammar will tell you that each player receives 6 observation points per turn, movement costs 1-3 points based on sector distance, revealing a tile costs 2 points, and research tokens convert to observation points at a 2:1 ratio. One is a suggestion. The other is a specification.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_vs_RawLLM.png&quot; alt=&quot;GameGrammar vs Raw LLM&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Suggestions versus specifications. GameGrammar produces designs with concrete numbers, specific components, and self-critical evaluation.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;what-happens-after-you-hit-generate&quot;&gt;What Happens After You Hit Generate&lt;/h2&gt;

&lt;p&gt;Generation is the beginning, not the end. No game design is right on the first pass, not even one that comes with a built-in critic. GameGrammar provides a workbench for iterating on generated designs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expand&lt;/strong&gt; any section to add detail. &lt;strong&gt;Re-evaluate&lt;/strong&gt; to run the BalanceCritic and FunFactorJudge again after you have made changes. &lt;strong&gt;Consistency Check&lt;/strong&gt; verifies that components match rules, that referenced items actually exist, that quantities add up. You can generate &lt;strong&gt;variants&lt;/strong&gt;, create &lt;strong&gt;cover art&lt;/strong&gt;, and &lt;strong&gt;export&lt;/strong&gt; to JSON, Markdown, or PDF.&lt;/p&gt;

&lt;p&gt;The Design Library stores every design with search, filters, and version history. A Community Gallery lets designers share their work and browse what others have created, finding mechanism combinations they might not have considered.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/introducing-gamegrammar-ai-powered-board-game-design/GameGrammar_Workbench.png&quot; alt=&quot;GameGrammar Workbench&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The design workbench. Expand, re-evaluate, check consistency, generate variants, create cover art, export. Generation is the start of the process, not the end.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This mirrors the real design workflow: prototype, test, iterate. GameGrammar compresses the generation side of that loop so designers can spend their time where it matters most, at the table with real players.&lt;/p&gt;

&lt;h3 id=&quot;what-remains-human&quot;&gt;What Remains Human&lt;/h3&gt;

&lt;p&gt;I want to be clear about what GameGrammar cannot do, because the boundaries matter more than the capabilities.&lt;/p&gt;

&lt;p&gt;It cannot &lt;strong&gt;playtest the game&lt;/strong&gt;. No algorithm can simulate the experience of four people around a table discovering that a mechanic is tedious or that a scoring path is dominant. It cannot &lt;strong&gt;read the room&lt;/strong&gt;, the laughter, the frustration, the surprise on a player’s face when a combo clicks. It cannot &lt;strong&gt;navigate the publication journey&lt;/strong&gt;, the manufacturing economics, the publisher relationships, the convention pitching. And it cannot &lt;strong&gt;make taste judgments&lt;/strong&gt;. Is a 45-minute game about Victorian astronomers &lt;em&gt;interesting&lt;/em&gt;? That is a question for the designer and their audience, not for an algorithm.&lt;/p&gt;

&lt;p&gt;GameGrammar is a design accelerator, not a replacement [2]. It frees designers from blank-page paralysis and gives them structured starting points worth iterating on. Everything that happens after — the playtesting, the polishing, the publishing — remains the designer’s craft.&lt;/p&gt;

&lt;p&gt;Already have a game? You do not need to generate from scratch. Describe your existing design to Nova, and the platform will structure it into the ontology format so you can run balance analysis and strategy testing on the game &lt;em&gt;you&lt;/em&gt; designed.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;try-it&quot;&gt;Try It&lt;/h2&gt;

&lt;p&gt;GameGrammar is available today in public beta at &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;gamegrammar.dynamindresearch.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The free tier includes 5 daily generations. Quick mode produces a complete design in about 15 seconds. The Multi-Agent pipeline takes under 90 seconds and delivers a design with built-in balance critique and fun-factor assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To generate your first game:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Create an account at &lt;a href=&quot;https://gamegrammar.dynamindresearch.com&quot;&gt;gamegrammar.dynamindresearch.com&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Enter a theme: anything from “pirates racing for treasure” to “quantum physicists competing for Nobel Prizes”&lt;/li&gt;
  &lt;li&gt;Set your constraints: player count, complexity, duration&lt;/li&gt;
  &lt;li&gt;Choose Multi-Agent mode for your first design&lt;/li&gt;
  &lt;li&gt;Read the balance critique. That is the part that will surprise you.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Explore section offers a Mechanism Browser with all 35 mechanisms, a compatibility matrix showing which mechanisms pair well together, and a Game Explorer for browsing 2,000+ published games as inspiration.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;This is Part 5 of the &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Game Architecture series&lt;/a&gt;. In Part 4, we built a structured vocabulary for understanding tabletop games [1]. Now, that vocabulary has become a creative engine with a name, an interface, and a generate button.&lt;/p&gt;

&lt;p&gt;Twelve words. Seventy-three seconds. Not a finished game — a structured first draft with five scoring paths, a hex grid, equipment upgrades, constellation bonuses, and a critic that tells you where it breaks.&lt;/p&gt;

&lt;p&gt;The blank page is no longer your enemy. It is your launchpad. And if you have already filled the page yourself, the platform is ready to help you test what you built.&lt;/p&gt;

&lt;p&gt;This article showed you the &lt;em&gt;what&lt;/em&gt;. If you want to understand the &lt;em&gt;why&lt;/em&gt;, &lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;Part 6: The Theory of Generative Board Game Design&lt;/a&gt; explores the design theory behind GameGrammar. It covers the three-layer architecture that separates mechanics from theme, the co-design relationship between human designers and computational agents, and the philosophical question of whether algorithms can understand fun. If you are curious about what makes this approach different at a deeper level, that is where to go next.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;series&quot;&gt;Series&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;Unlocking the Secrets of Tabletop Games Ontology (Part 4)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;»&lt;/strong&gt; &lt;a href=&quot;introducing-gamegrammar-ai-powered-board-game-design&quot;&gt;Introducing GameGrammar: AI-Powered Board Game Design (Part 5)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;gamegrammar-the-theory-of-generative-board-game-design&quot;&gt;GameGrammar: The Theory of Generative Board Game Design (Part 6)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Benny Cheung. &lt;a href=&quot;unlocking-secrets-of-tabletop-games-ontology&quot;&gt;&lt;em&gt;Unlocking the Secrets of Tabletop Games Ontology&lt;/em&gt;&lt;/a&gt;. bennycheung.github.io, Feb 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Part 4 of the Game Architecture series, the ontology foundation for this work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] Board Game Design Pipeline Analysis. Dynamind Research, Jan 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Nine-stage pipeline from concept to post-launch, positioning GameGrammar at Stages 1-2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Geoffrey Engelstein, Isaac Shalev. &lt;a href=&quot;https://www.routledge.com/Building-Blocks-of-Tabletop-Game-Design-An-Encyclopedia-of-Mechanisms/Engelstein-Shalev/p/book/9781138365490&quot;&gt;&lt;em&gt;Building Blocks of Tabletop Game Design: An Encyclopedia of Mechanisms&lt;/em&gt;&lt;/a&gt;. CRC Press, 2022.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Comprehensive reference for game mechanism taxonomy and classification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] &lt;a href=&quot;https://boardgamegeek.com/&quot;&gt;BoardGameGeek&lt;/a&gt;. The largest board game database and community.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Source for the 2,000+ game index used in RAG-enhanced generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] &lt;a href=&quot;https://www.dynamindresearch.com&quot;&gt;Dynamind Research&lt;/a&gt;. Research and product development studio.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Creator of GameGrammar, bridging computational design research with practical product implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[6] Benny Cheung. &lt;a href=&quot;https://arxiv.org/abs/2602.05636&quot;&gt;&lt;em&gt;Generative Ontology: When Structured Knowledge Learns to Create&lt;/em&gt;&lt;/a&gt;. arXiv:2602.05636, Feb 2026.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The formal research paper behind GameGrammar’s design theory and generative architecture&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Tue, 03 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/introducing-gamegrammar-ai-powered-board-game-design</link>
        <guid isPermaLink="true">https://bennycheung.github.io/introducing-gamegrammar-ai-powered-board-game-design</guid>
        
        <category>Game Design</category>
        
        <category>Tabletop Games</category>
        
        <category>Design Tools</category>
        
        <category>Game Architecture</category>
        
        <category>Board Games</category>
        
        <category>Co-Design</category>
        
        <category>Game Analysis</category>
        
        
        <category>post</category>
        
      </item>
    
      <item>
        <title>Editing NotebookLM Slides: A 4-Tool Pipeline</title>
        <description>&lt;!--excerpt.start--&gt;
&lt;p&gt;Google’s NotebookLM can generate beautiful slide decks from your notes in seconds, but it exports them as PDFs with no edit button. When the AI gets a date wrong or hallucinates a statistic, you are stuck. This article walks through a 4-tool pipeline, NotebookLM to Canva to Google Slides to Nano Banana Pro, that converts locked PDF slides into fully editable presentations and uses AI again to fix the content without breaking the design.
&lt;!--excerpt.end--&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/cover2x.jpg&quot; alt=&quot;Pipeline unlocking PDF slides into editable presentations&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. When AI gives you 90%, build a pipeline for the last 10%.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-problem-beautiful-slides-you-cannot-edit&quot;&gt;The Problem: Beautiful Slides You Cannot Edit&lt;/h2&gt;

&lt;p&gt;NotebookLM [1] is one of Google’s most impressive AI tools. Feed it your notes, research documents, or meeting transcripts, and it can generate a polished slide deck in seconds. The output looks professional. The structure is logical. The content is drawn directly from your sources.&lt;/p&gt;

&lt;p&gt;It feels like magic. Until you look closer.&lt;/p&gt;

&lt;p&gt;Maybe a date is wrong. Maybe it hallucinated a statistic. Maybe you just want to rephrase a bullet point that reads awkwardly. You stare at the output and realize the uncomfortable truth: &lt;strong&gt;NotebookLM exports slides as a PDF.&lt;/strong&gt; There is no “edit” button. No way back into a slide editor. You are stuck with a read-only document.&lt;/p&gt;

&lt;p&gt;Your only options are to regenerate the entire deck, hoping the AI gets it right this time, or accept the errors and move on. Neither option is acceptable when you are presenting to a client, a class, or your team.&lt;/p&gt;

&lt;p&gt;Google may eventually add direct export to Google Slides from NotebookLM, but that functionality is not available yet. Instead of waiting, we can solve the problem right now by chaining together tools that already exist into a simple pipeline.&lt;/p&gt;

&lt;h2 id=&quot;the-insight-a-4-tool-pipeline&quot;&gt;The Insight: A 4-Tool Pipeline&lt;/h2&gt;

&lt;p&gt;The fix is not a single tool. It is a chain of four tools, each doing what it does best, that transforms a locked PDF into a fully editable, AI-correctable slide deck.&lt;/p&gt;

&lt;p&gt;The pipeline:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Step&lt;/th&gt;
      &lt;th&gt;Tool&lt;/th&gt;
      &lt;th&gt;What It Does&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;NotebookLM&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Generates the initial slide deck from your notes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Canva&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Converts the PDF into an editable PPTX file&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;3&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Google Slides&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Provides a cloud-native editor with add-on support&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Nano Banana Pro&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Uses AI to fix slide content without breaking design&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The principle is straightforward: AI generates, you convert, AI fixes. Let’s walk through each step.&lt;/p&gt;

&lt;h2 id=&quot;step-1-generate-your-slides-in-notebooklm&quot;&gt;Step 1: Generate Your Slides in NotebookLM&lt;/h2&gt;

&lt;p&gt;Start where the magic happens. Open NotebookLM [1], upload your source material (notes, documents, research papers), and let it generate a slide deck. The tool analyzes your content, identifies key themes, and produces a structured presentation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_00_Generate_Slides.png&quot; alt=&quot;NotebookLM generating slides from source material&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. NotebookLM generates polished slide decks from your uploaded source material. Download the result as a PDF.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This gives you a visually polished deck, but one that is locked inside a static document. We need to break it out.&lt;/p&gt;

&lt;h2 id=&quot;step-2-convert-pdf-to-pptx-using-canva&quot;&gt;Step 2: Convert PDF to PPTX Using Canva&lt;/h2&gt;

&lt;p&gt;Canva [2] offers a free PDF-to-PPTX converter that does the heavy lifting of turning your static slides into editable PowerPoint format. You will need a Canva account (the free tier works).&lt;/p&gt;

&lt;p&gt;Go to &lt;a href=&quot;https://www.canva.com/features/pdf-to-ppt-converter/&quot;&gt;Canva’s PDF to PPT Converter&lt;/a&gt; and upload your NotebookLM PDF.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_01_Canva_PDF_to_PPTX.png&quot; alt=&quot;Canva PDF to PPTX converter interface&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Canva’s free PDF to PPT converter parses your NotebookLM PDF into editable slide elements.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Canva will parse the PDF into editable slides. Now, here is a critical detail that can save you frustration later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not just download the converted file directly.&lt;/strong&gt; Instead, use the &lt;strong&gt;Share &amp;gt; Microsoft PowerPoint&lt;/strong&gt; export option. This preserves the aspect ratios of all images in the deck. The difference is subtle but important: a direct download can stretch or crop images when you open the file in another editor, while the Share export maintains fidelity.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_02_Canva_Download_PPTX.png&quot; alt=&quot;Canva download options&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Canva provides multiple download options. The direct download works, but the Share export preserves image aspect ratios more reliably.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_03_Canva_As_PPTX.png&quot; alt=&quot;Canva Share as PowerPoint&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Use Share &amp;gt; Microsoft PowerPoint to export. This method preserves the original image dimensions and layout fidelity.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;step-3-import-into-google-slides&quot;&gt;Step 3: Import into Google Slides&lt;/h2&gt;

&lt;p&gt;Take the PPTX file and import it into Google Drive. Open it, then select &lt;strong&gt;Save as Google Slides&lt;/strong&gt; [3]. This converts the PowerPoint file into Google’s native slide format.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_04_Import_Save_as_Google_Slides.png&quot; alt=&quot;Importing PPTX and saving as Google Slides&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Import the PPTX file into Google Drive and save as Google Slides to gain access to the full editing suite and add-on ecosystem.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Why Google Slides specifically? Three reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Full editability.&lt;/strong&gt; Every text box, image, and shape becomes individually editable.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cloud-native collaboration.&lt;/strong&gt; Share with your team for real-time review and refinement.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Add-on ecosystem.&lt;/strong&gt; This is where the final piece of the puzzle lives.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;step-4-fix-content-with-nano-banana-pro&quot;&gt;Step 4: Fix Content with Nano Banana Pro&lt;/h2&gt;

&lt;p&gt;Here is where the workflow becomes genuinely clever.&lt;/p&gt;

&lt;p&gt;Nano Banana Pro [4] is a Google Slides add-on that brings AI-powered editing directly into your slide deck. Instead of manually retyping text on slides (which often breaks formatting, misaligns elements, or introduces visual inconsistencies), you describe what needs to change and the AI regenerates the slide while preserving the visual style.&lt;/p&gt;

&lt;p&gt;For example, say NotebookLM misspelled a key term, cited a wrong year, or used awkward phrasing. You open Nano Banana Pro in the sidebar and describe the fix in plain English:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_05_Built-in_Nano_Banana_Magic_01.png&quot; alt=&quot;Nano Banana Pro editing interface in Google Slides&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. Nano Banana Pro’s sidebar lets you describe corrections in natural language. The AI understands both the content change and the visual context.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After a few seconds, it generates a corrected version of the slide. You insert the result as a new slide:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_05_Built-in_Nano_Banana_Magic_02.png&quot; alt=&quot;Generated corrected slide ready to insert&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The AI generates a corrected slide that maintains the original visual design while incorporating your content changes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then you delete the old slide with the error. Done. A clean, corrected deck that looks like nothing was ever wrong.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;images/edit-notebooklm-slides-ai-pipeline/NotebookLM_Slides_Editing_05_Built-in_Nano_Banana_Magic_03.png&quot; alt=&quot;Final corrected slide deck&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure. The final result: a corrected slide seamlessly replaces the original, with no visible signs of editing.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-you-get&quot;&gt;What You Get&lt;/h2&gt;

&lt;p&gt;By chaining NotebookLM, Canva, Google Slides, and Nano Banana Pro, you get:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;AI-generated first drafts&lt;/strong&gt; that save hours of slide creation&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Full editability&lt;/strong&gt; despite the original output being a locked PDF&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;AI-assisted corrections&lt;/strong&gt; that fix content without breaking design&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cloud-native collaboration&lt;/strong&gt; so your team can review and refine together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that if you do not need the AI-powered correction step, you can stop at Step 3. Google Slides gives you full manual editing capability. Nano Banana Pro is the addition that makes corrections faster and less error-prone.&lt;/p&gt;

&lt;h2 id=&quot;limitations&quot;&gt;Limitations&lt;/h2&gt;

&lt;p&gt;This workflow is effective, but it is not perfect. Here are the honest trade-offs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slides are images, not editable elements.&lt;/strong&gt; Canva’s PDF-to-PPTX conversion extracts each PDF page as an image and inserts it into a PowerPoint slide. This means the text, shapes, and charts are not individually editable in the PPTX. You get one flat image per slide. This is exactly why Step 4 matters: Nano Banana Pro can regenerate a slide from its visual content, giving you a corrected version without needing to edit individual text boxes that do not exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool chain complexity.&lt;/strong&gt; Four tools is more steps than anyone would prefer. Ideally, Google would add native slide editing to NotebookLM’s output, or at least offer PPTX export alongside PDF. Until then, this pipeline fills the gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nano Banana Pro scope.&lt;/strong&gt; The add-on works best for targeted corrections such as fixing text, adjusting bullet points, and correcting data. It is less suited for wholesale redesigns of slide structure or layout. For major changes, you are better off editing directly in Google Slides.&lt;/p&gt;

&lt;h2 id=&quot;concluding-remarks&quot;&gt;Concluding Remarks&lt;/h2&gt;

&lt;p&gt;The bigger lesson here extends beyond NotebookLM slides. This workflow solves a friction point that is becoming common across AI tools: &lt;strong&gt;the output is impressive but imperfect, and the “last mile” of editing is either impossible or painfully manual.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The pattern repeats everywhere. An AI image generator produces a nearly-perfect visual with one wrong detail. A code assistant scaffolds an entire project but misses an edge case. A writing tool nails the structure but stumbles on a specific claim. In each case, the most productive response is not to regenerate and hope, but to build a workflow that bridges the gap.&lt;/p&gt;

&lt;p&gt;This is what it means to be an &lt;a href=&quot;rise-of-the-ai-powered-super-individual&quot;&gt;AI-powered super individual&lt;/a&gt; [5], not mastering every tool, but knowing how to chain them into workflows that turn “almost right” into “exactly right.” The value is not in any single tool. It is in the orchestration.&lt;/p&gt;

&lt;p&gt;When one AI tool gives you 90% of what you need, the answer is not to fight that tool’s limitations. Build a pipeline for the last 10%.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] Google. &lt;a href=&quot;https://notebooklm.google.com/&quot;&gt;&lt;em&gt;NotebookLM&lt;/em&gt;&lt;/a&gt;. Google Labs, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Google’s AI-powered notebook that can generate slide decks, podcasts, and summaries from uploaded source material.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[2] Canva. &lt;a href=&quot;https://www.canva.com/features/pdf-to-ppt-converter/&quot;&gt;&lt;em&gt;PDF to PPT Converter&lt;/em&gt;&lt;/a&gt;. Canva, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Free online tool for converting PDF documents into editable PowerPoint (PPTX) format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[3] Google. &lt;a href=&quot;https://workspace.google.com/products/slides/&quot;&gt;&lt;em&gt;Google Slides&lt;/em&gt;&lt;/a&gt;. Google Workspace, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Cloud-native presentation editor with real-time collaboration, version history, and an add-on ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[4] Nano Banana. &lt;a href=&quot;https://workspace.google.com/marketplace&quot;&gt;&lt;em&gt;Nano Banana Pro - Google Workspace Marketplace&lt;/em&gt;&lt;/a&gt;. Google Workspace Marketplace, 2024.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;AI-powered Google Slides add-on that enables natural language editing of slide content while preserving visual design.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[5] Benny Cheung. &lt;a href=&quot;rise-of-the-ai-powered-super-individual&quot;&gt;&lt;em&gt;The Rise of the AI-Powered Super Individual&lt;/em&gt;&lt;/a&gt;. Benny’s Mind Hack, 2025.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;How AI empowers individuals to achieve the output of entire teams through tool orchestration and “Silicon Intelligence Management.”&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Fri, 30 Jan 2026 12:00:00 +0000</pubDate>
        <link>https://bennycheung.github.io/edit-notebooklm-slides-ai-pipeline</link>
        <guid isPermaLink="true">https://bennycheung.github.io/edit-notebooklm-slides-ai-pipeline</guid>
        
        <category>AI</category>
        
        <category>NotebookLM</category>
        
        <category>Productivity</category>
        
        <category>Google Slides</category>
        
        <category>Canva</category>
        
        <category>Presentation Design</category>
        
        
        <category>post</category>
        
      </item>
    
  </channel>
</rss>
