Your Site Has a Doppelganger: Clean Markup in the Age of AI by Nicole Sylvester

Your site has a second visitor.

For every human who visits your site, another visitor arrives at the same time, follows the same path, and reads the same words, but experiences something your human visitors never do.

It can't see your hero image, and your typography is invisible to it. The scroll animation that signals progression, the color that draws the eye: none of that exists in the version it encounters. It moves through a parallel site, stripped to the bones.

That visitor is an AI agent. And you can design for it.

Two Visitors, Two Views

When a human lands on your page, they get the full production. Visual hierarchy guides the eye, color signals what matters, animation shows state, and whitespace lets the content breathe. They feel the craft.

When the doppelganger lands on the same URL, it gets the skeleton. It reads the structural markup, the heading relationships, the labeled fields, the links and where they go. It reads what's written, not what's shown.

Consider this simple example of two versions of a card component. To a human, these two render identically:

Two versions of a card component that render identically

<!-- Version A -->
<div class="card">
  <div class="card-title">Quarterly Report</div>
  <div class="card-body">A summary of Q1 performance...</div>
</div>

<!-- Version B -->
<div class="card">
  <h2 class="card-title">Quarterly Report</h2>
  <p class="card-body">A summary of Q1 performance...</p>
</div>

Same styling, identical rendered output. A human sees a bold title above a paragraph and knows instantly which is which. The size and weight say it before they've read a word.

The doppelganger can't see that. It reads the tags. In Version A, it finds three identical <div>s with nothing to say which one is the title and which is the body, so it has to guess. In Version B, the <h2> makes a claim a machine can read: this is the heading of what follows. Now the agent can quote it as a title, attach the paragraph to the right topic, and get the card right.

Two visitors, one piece of content, two entirely different experiences of it.

Who Is the Doppelganger?

The doppelganger isn't one thing. It's a category, and it's growing fast.

Crawlers and LLM scrapers index your content to train models and fill knowledge bases. You'll never know they were there.

AI search and answer engines, like Google's AI Overviews, synthesize your content into answers they hand straight to users. Whether you get cited accurately, misrepresented, or skipped depends heavily on how cleanly your content is structured.

Agentic AI tools browse on a user's behalf, filling out forms, pulling information, completing tasks. They follow the semantic structure precisely, and ambiguity makes them guess.

All of these doppelgangers are already visiting your site. Most of them can't see your design, so what they find when they arrive is the part you control.

What the Doppelganger Reads

Semantic structure tells the doppelganger what each part of the page is. The difference between <nav>, <main>, <article>, <aside>, and a stack of unstyled <div>s is invisible to a human when good styling does the work. To the doppelganger, it's the difference between a labeled map and an unmarked room.

Heading hierarchy gives the doppelganger the shape of your content. Here's a hierarchy that looks fine on screen but is broken underneath:

<h1>Our Services</h1>
<h3>Web Design</h3>      <!-- skipped from h1 to h3 -->
<h2>Pricing</h2>         <!-- back up to h2 -->
<h4>Enterprise</h4>      <!-- skipped from h2 to h4 -->

A designer styled these for visual rhythm. To the doppelganger, "Web Design" is a sub-subsection of nothing, and "Enterprise" sits four levels deep under a heading it doesn't belong to.

Compare:

<h1>Our Services</h1>
<h2>Web Design</h2>
<h2>Pricing</h2>
<h3>Enterprise</h3>

Same content, but with a coherent shape. Now the doppelganger knows what nests under what.

Alt text and link text round out what the doppelganger can read. An image with no alt text is a gap; a decorative image with chatty alt text is noise; an informational image described for what it means is content, and knowing which is which still takes human judgment. Link text works the same way: "Download the 2025 accessibility audit report" tells the doppelganger where it's headed, while "click here" tells it nothing.

Clean markup won't guarantee accurate AI output. LLMs read text differently than a parser walking your structure does, so the correlation is real but loose. What's not loose is the direction of it: agents that render pages, follow structured data, and parse feeds all respond to semantic structure. You can't control what a model finally says about your page. You can control how legible you make it, and legibility is the part that consistently helps.

What the Doppelganger Can't Read

For many agents, CSS doesn't exist at all. They fetch the raw HTML and never render it. Even the ones that do render rarely pull meaning from visual properties: knowing something is 48px and centered says nothing about whether it matters. Importance encoded only in style is fragile. Encoded in structure, it works for everyone.

Visual hierarchy doesn't reliably survive the trip to the semantic layer. What your eye reads as most important, from weight and placement, is mostly invisible to the doppelganger.

Animation and interaction are simply absent. The hover states, the scroll-triggered reveals, the entrance animations, the whole motion layer that guides a human's attention is no part of the doppelganger's experience.

Implied meaning is invisible. A human infers what a button does from its look, what a section holds from its place, and what matters from its prominence. None of that is written down anywhere. It's read off the surface. The doppelganger doesn't read surfaces. It reads what's written, and what's only implied isn't there.

How to Design for Both

The human and the doppelganger don't need two separate designs.

The visual layer is what human visitors experience: typography, color, spacing, motion, hierarchy. It's where most design work already lives.

The semantic layer is what the doppelganger experiences: structure, relationships, labels, landmarks, and reading order. This layer exists whether you design it or not. The only question is whether it was built on purpose.

These two layers have to tell the same story. When your visual design says "this is the most important thing on the page," your structure should say it too, through heading level, landmark region, or document position. When your visual hierarchy creates a section, your markup should mark one. Contradiction between the layers is where the doppelganger gets confused, misreads your content, or fails at what it came to do.

Designing the Visual Layer for Humans

This is the design work you already know.

The rule is simple: when something looks like a particular element, make it that element. If it looks like an <h2>, it should be an <h2>. If it looks like navigation, wrap it in a <nav>. Match the tag to the visual role instead of letting the two drift apart.

Interactive elements should look interactive and be interactive in the markup. A <button> that looks like a button works for humans and doppelgangers at once. A styled <div> dressed up as a button only works for humans.

Context should live in the content, not just the design. If a link, label, or heading only makes sense because of the visual design around it, put that meaning into the text itself.

Designing the Semantic Layer

Start with heading hierarchy: one <h1> per page, headings in order, no skipped levels, every heading describing what follows instead of chosen for visual effect.

Add landmark regions. Use <header>, <nav>, <main>, <footer>, and <aside> consistently, and keep important content inside labeled regions instead of floating in generic containers.

Write link text that reads on its own. Every link names its destination. Never "click here."

Treat alt text as content. Describe informational images for what they mean, not just what they show, and reproduce any text inside an image verbatim.

Finally, check your reading order: tab through the page with CSS disabled. Does it still make sense? Does importance still come through? Could someone grasp the page's purpose from the structure alone?

You Might Be Thinking…

"Designing for agents will compromise the human experience."

It won't. Good semantic markup is invisible to human visitors when good visual design runs in front of it. A page with clean heading hierarchy, meaningful landmarks, and descriptive link text looks identical to one without any of that, as long as the visual design is doing its job. The semantic layer runs underneath, parallel and unnoticed.

Neglecting it doesn't make your visual design any better. It just leaves the doppelganger's version of your site, the one already being read, summarized, and acted on, undesigned.

"AI is scraping our content without permission. Why would I make it easier?"

Fair. The copyright and consent questions around AI training are nowhere near settled, and publishers have real grievances. If you've decided to lock agents out, behind an auth wall or through legal means, that's a defensible call.

But locking them out and serving them an undesigned semantic layer are two different things. Bad structure won't stop you from being scraped. It just means that when you are scraped, you get misrepresented. Designing the semantic layer isn't about doing AI any favors. It's about keeping control of how your work gets understood, quoted, and acted on by whatever machine shows up. And the doppelganger is already showing up. So the real question is whether it finds a version of your site you meant, or one you didn't.

"This is extra work for a niche use case."

It isn't extra, and it isn't niche. AI Overviews show up in a large share of Google searches. Tools like ChatGPT and Claude browse the web for users who ask them to research, summarize, and recommend. Agentic systems that navigate sites on their own are moving from experiment to mainstream.

The doppelganger is becoming one of the main ways your content gets encountered, not as a page someone lands on, but as a source something else reads and then speaks, summarizes, or acts on. The audience for the semantic layer isn't shrinking to a niche. It's growing to a scale the web hasn't seen before.

And the work isn't new. Semantic HTML, clean heading hierarchy, meaningful link text, structured data, none of it is a special AI accommodation. It's what well-built code has always looked like.

It Was Always Clean Code…

Designing for the doppelganger isn't a new discipline. It's the one good engineers have practiced all along.

A page with proper heading hierarchy, real landmarks, semantic elements, and descriptive link text isn't optimized for agents. It's just built correctly. The structure that lets a doppelganger understand your page is the same structure that makes your code maintainable, your markup legible, and your intent durable. Clean code is the version of your site that holds up when the visual layer is stripped away.

Conclusion

The visitor is new. The discipline isn't.

Build the page correctly, and you've already designed for the doppelganger, not as extra work, not as an AI accommodation, but as the thing clean code was always meant to be: a site that means what it says, in the markup and not just on the screen.

The doppelganger is this decade's version of a visitor that can't see your design. There will be others. Clean code was the right answer before any of them arrived, and it stays the right answer no matter what shows up next.

Two visitors. Two perspectives. One layer that was always worth building right.