Skip to content

The Unreasonable Effectiveness of HTML as AI Output

7 min read

I needed a presentation last week. Nothing fancy. A visual explainer with structured sections, icons, and a coherent design language. The kind of thing you’d normally wrestle into PowerPoint for an hour.

I tried the obvious path first. PPTX generation skills, AI slide makers, every tool that promises “just describe your slides and we’ll build the deck.” None of them worked. The layouts were broken. The styling was inconsistent. The icons clashed. Spacing was random. I’d spend more time fixing the output than I would have spent building slides from scratch.

Then I tried something different. I asked Claude Code to generate an HTML file. I paired it with a DESIGN.md that described the visual language I wanted: warm terracotta accents, structured grid layouts, consistent icon styling, cream backgrounds. The result was a polished, shareable visual document in under five minutes.

The difference wasn’t the model. It was the output format.

Why PPTX Fails and HTML Doesn’t

PowerPoint is a binary format with XML internals, proprietary rendering rules, and decades of layout engine quirks. When an AI agent generates PPTX, it’s writing to a format it can’t see. It produces XML slide definitions and hopes the rendering engine interprets them correctly. There’s no feedback loop. No way to validate the visual output.

HTML is the opposite. It’s text. An AI agent can reason about CSS properties, calculate spacing, compose SVG icons, and structure layouts in a medium it was trained on. The relationship between code and visual output is direct and predictable.

FormatAgent VisibilityStyling ControlSharingIteration Speed
PPTXBlind (binary output)Limited, fragileFile attachmentMinutes per fix
PDFBlind (binary output)One-shot, no CSSFile attachmentStart over each time
HTMLFull (text-native)CSS, complete controlURL linkSeconds per tweak

The key insight: AI agents produce better output in formats they can read. HTML is text. AI reads text. The quality gap between HTML output and binary format output is enormous because the agent can actually reason about what it’s producing.

What HTML Gives You That Slides Can’t

The X blog by Thariq captures this perfectly. HTML isn’t just “a web page.” It’s a universal rich-content container.

Tabular data with proper tables. Styled layouts with CSS. Code snippets with syntax highlighting via script tags. Illustrations with inline SVG. Interactions with JS and CSS. Spatial data with positioned elements and canvases. Responsive design across any screen size. And the killer feature: sharing via a link.

Compare this to PowerPoint:

  • Want a responsive layout? Not possible.
  • Want interactive elements? Hyperlinks and animations. That’s it.
  • Want to share without an attachment? Upload to OneDrive and pray the formatting holds.
  • Want consistent spacing? Manually align every element.

HTML handles all of this natively. And because CSS is text, an AI agent can iterate on spacing, colors, and typography with precision instead of guessing.

The DESIGN.md Multiplier

Raw HTML from an AI agent still suffers from the “average of averages” problem. Ask for “a nice presentation” and you get rounded cards, blue gradients, and generic sans-serif fonts. The agent’s default taste is the mean of its training data.

This is where DESIGN.md changes the equation. Instead of describing your visual preferences in every prompt, you write them once:

---
brandName: "Presentation"
primaryColor: "oklch(48% 0.12 45)"
backgroundColor: "#FFFBF5"
accentColor: "#8B4513"
borderRadiusScale: [4, 8, 12]
spacingScale: [4, 8, 16, 24, 32, 48]
fontFamilies:
  display: "Cormorant Garamond, serif"
  body: "Lora, serif"
---

The agent reads this before generating any HTML. Every element matches. No guessing. No “make the color warmer, no warmer than that, okay too warm.” The constraints are explicit and the agent follows them on the first pass.

The combination is what makes this work: HTML as the output format (because the agent can see what it’s producing) plus DESIGN.md as the constraint file (because the agent knows what “good” looks like for your specific use case).

The Workflow in Practice

Here’s what the actual process looks like:

1. Write a DESIGN.md (or grab one from getdesign.md)
2. Tell Claude Code: "Make an HTML file. Read DESIGN.md first.
   Create a presentation about [topic] with [sections]."
3. Open the HTML file in a browser.
4. Iterate: "Make the grid 3 columns", "Add icons to each card",
   "Increase spacing between sections"
5. Share the file or host it.

Each iteration takes seconds because the agent modifies text (HTML/CSS) and you see the result instantly in your browser. No export step. No rendering pipeline. No “did it interpret my slide master correctly?”

For my presentation, the total time from first prompt to finished visual was under 5 minutes. Three iterations: initial layout, icon refinement, spacing adjustment. Done.

Beyond Presentations

Once you internalize this pattern, it applies everywhere:

Specs and planning documents. Instead of a flat Markdown file, generate an HTML document with collapsible sections, diagrams, and navigation. Information density goes up. Readability goes up.

Debugging and troubleshooting. Generate HTML reports with syntax-highlighted code, visual diffing, and interactive state exploration. Far more useful than text dumps.

Data analysis. Tables with sorting, charts with inline SVG, conditional formatting with CSS. No Jupyter export step.

Documentation. Interactive API references, live code examples, responsive layouts. Beats any static doc generator for one-off documents.

PRs and reports. Visual changelogs, architecture diagrams, annotated screenshots. Things that communicate better as rich documents than as plain text.

The common thread: every time you need AI output that’s visual, structured, or shareable, HTML is probably the right format. Not because it’s the most powerful rendering technology. But because it’s the one AI agents understand natively.

And Now: Video

This same logic extends further than static documents. HyperFrames, open-sourced by HeyGen, takes the thesis to its logical extreme: if HTML is the best agent-native format for visual output, then HTML is also the best agent-native format for video.

Programmatic video tools already exist. Remotion, Revideo, Motion Canvas. They all use React or similar frameworks under the hood. But they were designed for developers writing video code by hand. HyperFrames was designed specifically for AI agents writing video code.

The architecture is dead simple. Standard HTML, CSS, and JavaScript with a handful of data- attributes that define the video timeline:

<div id="root" data-composition-id="intro"
     data-width="1920" data-height="1080"
     data-start="0" data-duration="5">
  <div id="scene1" class="scene">
    <h1 class="title">HTML is Video</h1>
  </div>
</div>

GSAP drives the animation. data-start and data-duration control timing. Everything else is just web technology the agent already knows: CSS animations, SVG, Canvas, Three.js, Google Fonts. No wrappers. No framework to learn. No After Effects project file to reverse-engineer.

Liu Bin at HeyGen put it well: “What the symphony was to Beethoven, play was to Shakespeare. HTML is to agents.” LLMs were trained on billions of web pages. Millions of CSS and JavaScript animations. Hundreds of thousands of GSAP snippets. The web is the largest creative medium in their training data by orders of magnitude.

HeyGen discovered this the hard way. They tried letting agents build motion graphics through code for months. Early models weren’t reliable enough. Then Gemini 3 and Opus 4.5 landed, and the same agent pipeline suddenly produced consistent, high-quality output. The model didn’t need a video-specific format. It needed a format it already understood.

The pattern is the same one I hit with presentations:

Domain”Proper” ToolAgent-Native FormatResult
PresentationsPPTX generatorsHTML + CSSPolished visuals in minutes
VideoAfter Effects, DaVinciHTML + GSAP + data attributesFull motion graphics from prompts
DocumentsPDF generatorsHTML + CSSRich, shareable, interactive

One install command makes any Claude Code session a video editor:

npx skills add heygen-com/hyperframes

The friction goes to zero. Same principle as everything else in this post: stop forcing agents to learn human tools. Give them a text-based format they already understand natively, and the output quality jumps by an order of magnitude.

Why This Feels Different

There’s something Thariq captures in that blog that’s hard to articulate with just the technical argument: it’s joyful. More fun to create. More engaging to explore. More satisfying to share.

When an AI agent generates a PowerPoint file, you open it with dread. “What did it mess up this time?” When it generates an HTML file, you open it with curiosity. “Let’s see what it made.” The feedback loop is immediate, the iteration is fast, and the output is genuinely good on the first pass.

This matters because it changes how often you reach for the tool. If generating visual output is painful, you’ll only do it when forced. If it’s fast and fun, you’ll do it for everything. Internal proposals, architecture explainers, client deliverables, personal notes. The medium stops being an obstacle.

The Bottom Line

Stop fighting binary formats. AI agents produce dramatically better visual output in HTML because they can read what they write. Pair that with a DESIGN.md for brand constraints and you have a workflow that produces polished, shareable, responsive documents in minutes.

The “unreasonable effectiveness” isn’t really unreasonable at all. It’s just what happens when you let an AI agent work in its native medium instead of forcing it to write blind.


Using Claude Code for visual output? I’d love to hear what formats and workflows you’ve landed on. Reach out on LinkedIn.