TL;DR: If you want to appear in LLMs and AI Overviews, create well-structured, conversational content that appears in the top 10 search results. Keep headings in the right order, make those headings descriptive, use lists, and consider including semantic tags.
In early 2024, Google struck a deal with Reddit to pay them $60 million annually for AI training data access. A couple months later, Reddit’s search traffic exploded and has shown no signs of slowing down.
Data collected from 4 billion AI citations and 300 million LLM responses showed that, aggregated across all the major AI tools, Reddit is the #1 most-cited source.
Why?

Well, having an inside deal with Google certainly didn’t hurt.
But the type of content LLMs prefer is exactly the type of content present on Reddit: well-structured, conversational, expert, human, and helpful.
And ever since Google’s October 2025 update that functionally removed LLMs’ ability to reference anything past rank 10 in search results, it’s not a surprise that Reddit’s insane rise in the rankings has coincided with a similar rise in LLM mentions.
If we want to show up in AI queries more often, we need to learn from Reddit’s content.
We also need to SEO our way to top-10 rankings as much as possible, and given that Google Search Central itself says “Google’s ranking systems aim to reward original, high-quality content,” you’re more likely to get there if you’re not just copy-pasting stuff written by AI.
And hey, turns out this is the type of content humans like and are most likely to read anyway. So it’s a win-win.
Structure your content well, I beg you
Part of the reason Reddit is so desirable as an AI training dataset is its consistent, easy-to-navigate markdown structure. Stack Overflow and Wikipedia are two more examples of excellent, consistent formatting that show up all the time in AI results. We can learn from them, and recreate their strategies in our content.
To structure content in the ways AI likes (listed from most to least impactful):
- Keep headings in the right order
- Make those headings descriptive
- Use bulleted and numbered lists
- Consider including semantic tags
Further explanations below.
Keep headings in the right order
<h1> to <h2> to <h3>, and so on.
Getting anything out of order will cause LLMs some problems. Older models would just give up on reading a page if the tags jumped from <h1> to <h3> or something, but even current models will lose some comprehension if your headings are out of order.
Make those headings descriptive
Make sure the headings you write match the content below them. This is standard and best-practice, so no big insights here. Don’t try to get too clever with your headings if you want LLMs to parse and feature your content.
Use bulleted and numbered lists
LLMs prefer content that:
- Gives answers up front
- Is glanceable and skimmable
- Is formatted cleanly and clearly
- Uses shorter sections and sentences
Formatted lists accomplish all of these goals.
Consider including semantic tags
More context is needed for this one.
Some researchers with Amazon produced an excellent study on what they call Tagging-Augmented Generation (TAG). Among the primary findings:
Tagging the context or even just adding tag definitions into QA prompts leads to consistent relative performance gains over the baseline – up to 17% for 32K token contexts, and 2.9% in complex reasoning question-answering for multi-hop queries requiring knowledge across a wide span of text.
Adding tags to your content makes LLMs perform better, at least as long as you use some discretion. If you’re the one using the LLM, prompting with those tags and their definitions further increases their accuracy.
Overusing tags brings the content from what the study calls a “Needle-in-a-Haystack” problem (the LLM is searching for a single useful piece of information surrounded by irrelevant information) to what I’m going to call a My-Haystack-is-Made-of-Needles problem (the LLM thinks everything is now useful and relevant and has to sort through it anyway).
Choose the 5-10 most relevant pieces of information, tag those, and keep it consistent across the content.
Here’s an example of what this could look like, applied to the first few paragraphs of this article:
<p>
In <date>early 2024</date>, <org>Google</org> struck a deal with
<org>Reddit</org> to pay them <quantity>$60 million annually</quantity>
for <concept>AI training data access</concept>.
A couple months later, <org>Reddit’s</org> search traffic exploded
and has shown no signs of slowing down.
</p>
<p>
Data collected from <quantity>4 billion AI citations</quantity> and
<quantity>300 million LLM responses</quantity> showed that,
aggregated across all the major AI tools, <org>Reddit</org> is the
<rank>#1 most-cited source</rank>.
</p>
<p>
And ever since <event>Google’s October 2025 update</event> that
functionally removed LLMs’ ability to reference anything past
<rank>rank 10</rank> in search results, it’s not a surprise that
<org>Reddit’s</org> insane rise in the rankings has coincided
with a similar rise in LLM mentions.
</p>
Please don’t do this manually. Your AI of choice or spaCy (an open-source natural language processor Python library) can handle it.
It’s honestly a bit excessive to do tagging like this, but I guarantee you most of the other content on the internet won’t bother to implement it. Any differentiator we can get is helpful.
Write like a person
Remember AI in 2024? Maybe you don’t. AI capabilities are growing at a faster-than-exponential rate, and we’re getting new models seemingly every couple weeks now that blow the pants off (is that a phrase? Whatever) everything that came before.
A tool launched in late February 2026 has set to the task of determining what percentage of the internet is written by AI. At time of writing, it’s sitting at 13.4%. It’s anyone’s guess how reliable this number is, since the accuracy of AI detection tools leaves a lot to be desired and humans score even worse, but 13.4% of all content tested being AI only a few years since LLMs really burst onto the scene doesn’t bode well for a human-written future, and it’s only getting harder to tell what’s AI and what’s not. It’s likely the real number is higher.

I say all this because people at this point are craving authenticity. Nobody wants to read your blog that was copy-pasted from AI. Why would anyone spend time reading something on your site if they can just go to Gemini or ChatGPT and get the same thing in five seconds?
AI is only going to get better and more human-like. That tool linked above estimates that, based on current trends, we’re going to see an internet approaching 100% AI by around 2032. But we have a rocky road ahead of us, and in the meantime it’s a legitimate differentiator to actually write your content yourself. Like in the olden days. A few years ago.
We’re living in a relatively unique period in online content algorithms where what works for humans also happens to be what works for machines. If your writing is clear, engaging, distinctive, and contains information people can’t get anywhere else, you’re going to outperform your competitors who are copy-pasting from prompts.
You’ll also create things you’re proud of, and we think that counts for something too.
Say something, then prove it
Find high quality, relevant sources, and then cite them. Include them as anchor links on relevant text, ideally, and/or include a citations list at the ends of your articles.
Because you know what AI is really good at?
Lying. And doing it convincingly.
I default to thinking every stat AI gives me is made up. It can write generic, largely unsubstantiated blocks of text up to ~500 words, and anything else it tries to do (so far) leaves quite a lot to be desired.
You immediately stand out if you think back to your high school English classes and write up a bibliography.
Make it accessible
Accessibility audit data shows the same types of errors that impair screen reader navigation also weaken AI content comprehension. Screen readers and language models both depend on semantic structure to navigate content.
We at Silktide care deeply about accessibility, and frankly think it’s absurd that everything on the internet isn’t accessible by default. There’s way more to it than this, but here’s a quick rundown of some of the things you can implement that account for most of the accessibility issues we see.

Write alt text for every image
Images with detailed alt text generate more accurate AI descriptions than images without, according to AI performance benchmarks analyzing computer vision training effectiveness. This feeds AI models the contextual information they need to understand visual content during training.
Some real-world examples from the sites I’ve mentioned so far:
- Stack Overflow: Their policy of mandatory alt text for code screenshots makes their programming examples more reliable training data than GitHub repositories without image descriptions
- Reddit: Has a culture of captioning images (and then having other users leave further comments on those images) that creates exactly the kind of multimodal training examples that improve AI model accuracy for both text and visual tasks
- Wikipedia: Requires image descriptions, which serve the same purpose of creating structured visual-textual connections
We’re in an era of AI ingesting as much content as possible. If your image doesn’t have alt text, it won’t help with your AIO or SEO efforts.
Some rules of thumb for good alt text:
- Don’t use it on images that don’t convey any information or are visual padding, you can mark them as decorative.
- When possible, put the information in the written content as a description. Use tables for data shown in charts. The image then becomes decorative.
- Start with the most important information first.
- Describe the content and purpose of the image (e.g., “A bar chart showing revenue growth from 2020 to 2023 from fifteen million dollars to eighteen point five million dollars”).
- Don’t use phrases like “Image of” or “Picture of”.
Include ARIA labels
ARIA labels provide semantic context that helps LLMs understand interface relationships during web scraping for training data. Make sure you know what you’re doing when using them though – WebAIM’s incredible Million Report found “The more ARIA attributes that were present, the more detected accessibility errors could be expected” on a page.
The W3C’s accessibility guidelines incidentally created the ideal markup pattern for AI content comprehension by requiring descriptive, contextual markup that mirrors how language models process information. These labels are helpful, if and only if they’re implemented properly.
Accessibility scores predict AI citation accuracy better than PageRank
Sites scoring high on accessibility audits get cited more accurately in AI responses than sites with poor accessibility scores.

Skip links designed for keyboard navigation become AI crawling waypoints that help models understand content hierarchy during training data collection, for example, and landmark roles that guide visually impaired users through page sections provide the same structural scaffolding that improves AI model comprehension.
Descriptive heading structures that enable screen reader users to jump between topics create the exact parsing framework that improves AI model comprehension, as demonstrated in the OpenAI research on heading hierarchy impact. An analysis of AI-generated summary accuracy reveals that properly structured content consistently outperforms keyword-optimized but semantically flat content across multiple model architectures.
All that to say:
Accessibility already makes the web better for both you and your users, and there’s plenty of proof that it also makes it better for AI. So make your site accessible.
What if GEO was just SEO all along?
If our goal is to appear in generative engines, and those engines prefer human content that’s in the top 10 search rankings, aren’t we just doing SEO again?
A lot of hubbub has been created yet again about SEO being “dead” because of a new technology. I remember the early days when people would stuff keywords into the backgrounds of their websites, change the font color to be the same as the background color, and game Yahoo rankings. Google’s PageRank released and used backlinks to measure authority and everyone said “SEO is dead!” Then it was social media, then it was the 2015 mobile optimization, then it was RankBrain, and now it’s AI and zero-click searches. SEO is still alive and well.
Search engines aren’t going anywhere. They might change form, but I don’t think it’s likely that we’re going back to the world of webrings any time soon (call me when I start doing WebRing Optimization).
We as SEOs just need to adapt to a new paradigm. We’ve done it before. Keep making good content, keep knowing what you’re talking about, and keep presenting that information in a way that’s good for people. There’s infinite complexity if you want there to be, but really it’s just back to basics.
See you in your agentic chat window.
References:
- Devane, Declan, et al. “Comparison of AI-Assisted and Human-Generated Plain Language Summaries for Cochrane Reviews: A Randomised Non-Inferiority Trial (HIET-1) [Registered Report – Stage II].” Journal of Clinical Epidemiology, vol. 191, Mar. 2026, p. 112102, www.sciencedirect.com/science/article/pii/S0895435625004354, https://doi.org/10.1016/j.jclinepi.2025.112102.
- “Document.” Www.sec.gov, www.sec.gov/Archives/edgar/data/1652044/000165204423000041/googexhibit991q12023.htm.
- Hanley, Margot, et al. Computer Vision and Conflicting Values: Describing People with Automated Alt Text, https://dl.acm.org/doi/10.1145/3461702.3462620.
- Leedy, Katelyn, et al. “Accessibility in the Age of Generative AI Web Based Builders: Evaluating Web Design Tools for Inclusive Practices.” Proceedings of the 28th International Academic Mindtrek, 7 Oct. 2025, pp. 304–314, https://doi.org/10.1145/3757980.3757989.
- Lin, Trevor, et al. “Evaluating the Accuracy of Advanced Language Learning Models in Ophthalmology: A Comparative Study of ChatGPT-4o and Meta AI’s Llama 3.1.” Advances in Ophthalmology Practice and Research, vol. 5, no. 2, May 2025, pp. 95–99, https://doi.org/10.1016/j.aopr.2025.01.002.
- Madfa, Ahmed A., et al. “Accuracy and Reliability of Manus, ChatGPT, and Claude in Case-Based Dental Diagnosis.” Frontiers in Oral Health, vol. 6, 8 Jan. 2026, pmc.ncbi.nlm.nih.gov/articles/PMC12823890/, https://doi.org/10.3389/froh.2025.1686090.
- Pal, Anwesan, et al. “Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge in Long Contexts.” Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2025, pp. 2209–2220, aclanthology.org/2025.emnlp-industry.153/, https://doi.org/10.18653/v1/2025.emnlp-industry.153.
- “The Power of Scale in Machine Learning – Kempner Institute.” Kempner Institute, 18 Aug. 2025, kempnerinstitute.harvard.edu/news/the-power-of-scale-in-machine-learning/.
- Tong, Anna, et al. “Exclusive: Reddit in AI Content Licensing Deal with Google.” Reuters, 22 Feb. 2024, www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/.
- W3C. “Web Content Accessibility Guidelines (WCAG) 2.1.” W3.org, 6 May 2025, www.w3.org/TR/WCAG21/.
- Wang, Te-Hao, et al. “Evaluating GPT-4’S Visual Interpretation and Clinical Reasoning on Emergency Settings: A 5-Year Analysis.” Journal of the Chinese Medical Association, vol. 88, no. 9, 28 July 2025, pp. 672–680, https://doi.org/10.1097/jcma.0000000000001273.
- WebAIM. “WebAIM: The WebAIM Million – an Annual Accessibility Analysis of the Top 1,000,000 Home Pages.” Webaim.org, 2024, webaim.org/projects/million/.
