The test

The comparison used six screen types that game teams often ask image models to mock up: match-3 HUD, card battle, tower defense, SLG map, roguelike reward, and narrative choice. Each case was evaluated as production evidence, not as standalone art.

The controlled path required a SCREEN_BRIEF, LAYOUT_CONTRACT, VISUAL_STYLE_CONTRACT, IP_SIMILARITY_GATE, IMAGE_PROMPT_LOCKED, IMAGE_REVIEW, and implementation handoff before the output could pass.

What improved

  • Gameplay state became clearer: AP, timers, turn state, reward choice, and objective signals were easier to inspect.
  • Layouts had fewer decorative UI piles and more stable reading order.
  • Review scores became actionable because the output had explicit pass, revise, and reject criteria.
  • IP risk was handled before publication instead of after a polished but unsafe image already existed.
  • Developer handoff became more concrete because each accepted image carried screen intent and implementation notes.
Six accepted Skill-controlled screen references
The controlled workflow was judged on screen usefulness, not only visual polish.

Where direct output still helped

Direct generation was useful for quickly discovering an art direction or mood. The failure was narrower: without a brief, layout contract, and review gate, strong-looking images often hid unclear state, copied familiar interface patterns, or produced UI that looked complete but was hard to implement.

The IP gate is not optional

If a generated screenshot can be recognized as a commercial game at a glance, it is not safe public proof. The Skill treats that as a hard reject, even when the image is attractive. The goal is to produce original, reviewable references instead of familiar screenshots with different labels.

Safe IP gate sample set for generated game screen references
Outputs that pass the gate still need normal legal and production review. The gate only blocks obvious lookalike failures.

Practical conclusion

The controlled path was not a magic claim that every image becomes prettier. The useful claim is more specific: the outputs were more playable, easier to review, safer to publish, and easier to turn into implementation notes.