Why Cloudflare Is Rethinking CDN Cache for the AI Era (and What It Means for Everyone Else)

AI generated image for Why Cloudflare Is Rethinking CDN Cache for the AI Era (and What It Means for Everyone Else)

Somewhere in a Cloudflare point-of-presence (PoP), a perfectly innocent cache server is trying to do what caches have always done: keep popular stuff close to users so the Internet feels snappy. Then an AI crawler shows up and behaves like a sleep-deprived intern with unlimited energy drinks, opening every door in the building just to “get context.” Suddenly the cache is full of long-tail pages nobody’s grandmother ever clicked, and the humans—who were here first and pay the bills—start seeing more misses, more latency, and more origin load.

That, in a nutshell, is the premise behind Cloudflare’s April 2, 2026 blog post Why we’re rethinking cache for the AI era, authored by Avani Wildani and Suleman Ahmad. Cloudflare argues that the modern web’s caching assumptions—built around human traffic locality—are being undermined by AI agents and crawlers with wildly different access patterns. citeturn1view0

Cloudflare isn’t merely complaining. It’s also pointing to research with ETH Zurich, plus a broader industry conversation about how CDNs should evolve when the “reader” is increasingly a machine. And as a bonus, the topic touches everything from cache eviction algorithms like LRU to fresher, less LRU-ish ideas like SIEVE and S3-FIFO. citeturn1view0turn3view1turn3view3

Let’s unpack what’s changing, why it matters, and what website operators, platform engineers, and CDN builders should do—before the bots redecorate your cache with 404s and PDF downloads.

The original story: Cloudflare’s “cache meets AI” wake-up call

Cloudflare’s core observation is stark: automated traffic is a huge share of what their network sees. They cite Cloudflare data showing that 32% of traffic across their network originates from automated traffic—a broad category that includes traditional crawlers (search engines), uptime monitors, ad tech, and now AI systems pulling data for retrieval-augmented generation (RAG) and for model training. citeturn5view0

In their framing, website operators are increasingly forced into an awkward choice: optimize caching and infrastructure for humans, or for AI crawlers. If both types of traffic share the same cache resources, AI traffic can erode cache efficiency for humans. citeturn5view0

Cloudflare also stresses that not all “AI traffic” is the same. Some crawls are latency-sensitive (think: an assistant doing RAG in real time). Other crawls are bulk and offline (think: training data collection), where latency matters far less than throughput and cost control. citeturn4view0

Why classic CDN caching works (when your users are… human)

Classic CDN caching is built around a couple of friendly truths about human behavior:

  • Temporal locality: when something is popular, people request it again soon.
  • Spatial/semantic locality: if someone visits one page, they often visit related pages.
  • Skewed popularity distributions: a small set of URLs often accounts for a large share of requests.

These assumptions make simple eviction policies like LRU (Least Recently Used) remarkably effective: keep what’s being requested recently; evict what hasn’t been touched. It’s cheap, easy to implement, and typically performs well under the kinds of workloads CDNs grew up serving.

Cloudflare explicitly notes that it manages cache using LRU and that the presence of identified AI crawlers correlates with a drop in cache hit rate on at least one CDN node in their examples—suggesting LRU struggles with repeated scanning behavior. citeturn4view0

How AI crawler traffic is different: the three traits that break your cache

Cloudflare’s post highlights three differentiating characteristics of AI crawler traffic:

  • High unique URL ratio: crawlers touch a lot of distinct pages rather than repeatedly requesting a small “hot set.”
  • Content diversity: crawlers don’t just focus on the homepage and the top blog posts; they pull in docs, images, code, and deep pages.
  • Crawling inefficiency: a meaningful fraction of requests may be “wasted” on redirects and 404s, in part due to poor URL handling.

They cite public crawl statistics from Common Crawl to support the “uniqueness” point, and they also note that crawlers can run multiple independent instances that don’t share sessions—so they appear as new visitors and don’t benefit from browser-like caching behavior. citeturn1view0

There’s also a subtle but important behavioral shift: AI agents doing RAG can iterate. If an agent loops through “search → fetch → refine → search again,” it may keep pulling new unique sources to improve accuracy—good for the model, terrible for a cache depending on reuse. Cloudflare models this behavior and reports a unique access ratio typically between 70% and 100% across loops in their example. citeturn4view0

Translation for ops teams: cache churn and expensive misses

In CDN terms, the biggest problem is not “bots exist.” It’s cache churn: long-tail content pushed into limited cache capacity, evicting content that humans will actually reuse. When human requests miss more often, the CDN must fetch from origin more frequently, increasing origin load, latency, and (for many customers) egress costs.

Cloudflare also points out that common hit-rate boosters like prefetching and cache speculation become less effective when the workload has low reuse. citeturn4view0

This isn’t hypothetical: real-world incidents show the pain

Cloudflare’s post includes a table of reported impacts across several organizations. The general pattern: AI crawlers and scrapers generate disproportionate load, and the most common “mitigation” is blunt—block them, rate-limit them, or geo-block them. citeturn4view0

Some of these incidents are well-documented elsewhere. A few notable examples:

Wikimedia Commons bandwidth spikes

Wikimedia has publicly discussed how automated scraping for AI training can drive major increases in bandwidth consumption for multimedia downloads. Coverage has reported a 50% surge in bandwidth for multimedia downloads since January 2024 associated with AI crawlers. citeturn6search5turn6search8

Even if your site doesn’t serve giant images, Wikimedia’s experience is a useful warning: the “expensive” assets on your infrastructure (large files, media, archives, big docs) are exactly what scrapers tend to love—especially if they’re crawling naively and repeatedly.

Read the Docs: repeated downloads and missing conditional requests

Read the Docs—home to vast swaths of open-source documentation—has described abusive crawling where bots repeatedly download large files hundreds of times a day. They also point out that some crawlers lacked basics like bandwidth limiting or proper use of caching headers (ETag, Last-Modified) that could avoid refetching unchanged content. citeturn6search0

That detail matters: a lot of the AI-era caching problem is not only “the cache got worse,” but also “the clients are not using the web the way browsers do.” If your crawler ignores conditional GETs, it’s essentially choosing to pay full freight every time.

SourceHut: LLM crawlers that look a lot like a DDoS

SourceHut has documented recurring issues from LLM crawlers that it describes as continuing to DDoS parts of the service, with public status reports detailing bot-driven load and mitigations. citeturn6search1

Many smaller platforms—especially community infrastructure (git forges, docs hosts, volunteer-run sites)—are not sized for relentless, parallelized bulk fetching. They’re sized for humans plus a bit of “normal bot” behavior. The AI era changes what “normal bot” means.

What the ETH Zurich + Cloudflare research adds: measurable degradation under mixed workloads

Cloudflare’s blog post isn’t just vibes and charts; it’s tied to a research line with ETH Zurich. The post points to a full paper presented at the 2025 ACM Symposium on Cloud Computing (SoCC ’25): Rethinking Web Cache Design for the AI Era by Zhang et al. citeturn1view0turn3view0

In that paper, the authors model a two-tier CDN cache architecture inspired by Wikimedia’s production setup (frontend cache with Varnish and a backend disk-based cache with Apache Traffic Server), plus backend layers like MediaWiki, Memcached, and MariaDB. citeturn5view1

One of the most quotable results: under their experimental setup, a human-like workload yields a low miss ratio at the Varnish layer, but adding AI-like scan traffic quickly degrades performance. Specifically, with just 25% AI traffic, their Varnish miss rate nearly doubles (from 17.3% to 32.2%). Under fully scan-dominated workloads, they report a Varnish miss ratio reaching 51.8%. citeturn5view1

That’s the “AI tax” made visible: even if the total request volume stays the same, a change in locality profile can crater your hit ratio—forcing more work onto deeper cache tiers and origins.

So what do we do? Cloudflare’s “AI-aware caching” direction

Cloudflare proposes a combination of algorithmic and architectural responses:

  • Improve cache eviction behavior so mixed human + AI traffic doesn’t punish humans.
  • Potentially add a separate cache layer (or tiering strategy) for AI traffic, routing different AI workloads to different cache depths.

In their post, Cloudflare suggests that AI traffic for live applications (RAG, summarization) should be routed to caches that balance larger capacity with moderate response times, while bulk training crawls could be served from deeper cache tiers or even delayed via admission control/queuing to protect backend systems. citeturn4view0

Why “just block the bots” isn’t a full solution

Blocking can be rational—especially when a crawler is clearly abusive. But Cloudflare’s post emphasizes a more nuanced reality: many site operators want some AI traffic. Developers might want docs discoverable in AI search; ecommerce sites might want product data surfaced; publishers may want compensation through mechanisms like “pay per crawl.” citeturn5view0

In other words, the goal is not necessarily a bot-free web. It’s a web where the economics and the infrastructure aren’t silently demolished by machine readers.

Cache eviction in the AI era: why LRU takes it on the chin

If you’ve ever looked at cache traces, you know why this gets ugly quickly. LRU is great when the hottest items stay hot. But scan-heavy access patterns are adversarial: they can push a stream of distinct objects through the cache, evicting useful items without providing reuse.

To understand why Cloudflare mentions alternatives like SIEVE and S3-FIFO, it helps to zoom out: the research community has spent years designing policies that do better than pure recency under messy workloads.

SIEVE: “simpler than LRU” (and fast)

SIEVE is presented (by its authors) as a turnkey eviction algorithm for web caches with strong scalability benefits. The USENIX NSDI’24 presentation page notes that SIEVE achieves a lower miss ratio than multiple state-of-the-art algorithms on a significant share of traces, and that its prototype can reach twice the throughput of an optimized 16-thread LRU implementation. citeturn3view1

If you operate cache infrastructure, throughput is not a vanity metric. A “slightly better hit ratio” isn’t worth it if the metadata handling melts your CPU. The appeal of algorithms like SIEVE is that they promise a better trade-off: improved miss ratio with implementation simplicity and concurrency friendliness.

S3-FIFO: FIFO queues that punch above their weight

S3-FIFO (Simple, Scalable caching with three static FIFO queues) is another modern policy Cloudflare name-drops. It’s discussed in the SOSP 2023 paper FIFO Queues are All You Need for Cache Eviction, which compares S3-FIFO to other algorithms and emphasizes quick demotion of new objects and favorable miss ratio characteristics across traces. citeturn3view3

FIFO-based approaches can be surprisingly effective because they can separate “probationary” objects (newly inserted, unproven) from objects that have demonstrated reuse—without needing full LRU tracking for everything. In scan-heavy environments, that probationary separation is basically a shield against letting every one-off request occupy prime real estate.

Connecting the dots: eviction algorithms as implicit traffic filters

The ETH/Cloudflare paper explicitly frames advanced eviction algorithms as implicit traffic filters that can deprioritize scan-like requests while preserving human hit rates—even without explicit traffic classification. citeturn5view1

This is key: if you can’t reliably label every request as “human” or “AI bot” (and in practice you often can’t), you can still design a cache that behaves sensibly when the workload contains both locality-rich and locality-poor patterns.

Architecture matters too: separate tiers for separate behaviors

Algorithm changes help, but Cloudflare and the ETH team both converge on a likely longer-term conclusion: separating human and AI traffic across different cache tiers may be the cleanest solution.

Why? Because the two workloads want different things:

  • Humans want low latency and consistent performance on a relatively small hot set.
  • Interactive AI (RAG) may accept slightly higher latency but still wants “fast enough” and very high availability.
  • Bulk AI training crawls can accept much higher latency, can be queued, and can be scheduled around infrastructure load.

Cloudflare describes exactly this kind of tiered idea: route human traffic to edge caches optimized for responsiveness, and route AI traffic based on task type to other tiers (deeper caches, origin-side SSD caches, or rate-limited queues). citeturn4view0

If you squint, this is the return of an old systems pattern: QoS by isolation. Don’t let one workload trash another’s working set if you can help it. The AI era just makes the “noisy neighbor” problem global.

What this means for website owners (not just CDNs)

Most of the Internet does not operate its own multi-tier CDN. But you still have knobs—and in the AI era, you probably need to turn a few of them.

1) Identify and segment bot traffic early

Cloudflare’s framing implicitly assumes you can identify “AI crawlers” at least sometimes (via user-agent, known IP ranges, behavioral heuristics, or platform tooling). That’s not perfect, but it’s enough to build a playbook:

  • Track request rates and unique URL ratios by user-agent category.
  • Watch for deep-link scanning patterns (depth-first traversal, sequential URL patterns, sitemap-like behavior).
  • Measure 404/redirect rates by crawler—high “ineffective request” rates are a red flag for wasted capacity. citeturn1view0

2) Treat large assets as a separate product

Read the Docs’ experience shows that large file downloads are a common pain point, especially when crawlers don’t use conditional requests properly. citeturn6search0

Practical steps:

  • Put big artifacts (PDFs, archives, installers, media) behind tighter rate limits or authenticated endpoints.
  • Ensure ETag/Last-Modified headers are correct and stable; monitor whether clients actually use them.
  • Consider a dedicated download domain or bucket with separate throttling rules.

3) Make “AI-friendly” views that are cheaper to serve

Cloudflare mentions projects like “Markdown for Agents” and AI indexing initiatives to present simplified versions of content to known AI bots. The broader idea is simple: give machine consumers a version of your content that’s less expensive to fetch and parse. citeturn4view0

Even without Cloudflare-specific tooling, you can approximate the approach:

  • Publish clean, well-structured HTML with clear headings (less JS gating, fewer heavyweight assets).
  • Offer text-first endpoints for documentation and knowledge bases.
  • Use canonical URLs and sane redirects to reduce crawler inefficiency.

4) Decide your policy: allow, rate-limit, block, or monetize

Cloudflare explicitly suggests that some publishers may want compensation models, mentioning “pay per crawl.” citeturn5view0

Cloudflare has documentation for Pay Per Crawl as part of its AI Crawl Control feature set, positioning it as a way to control and potentially charge for bot access. citeturn6search2

Whether monetization works at scale is still debated in the broader ecosystem, but from an operator perspective, the immediate value is control: you get to define who can crawl what and under what conditions. (And yes, a lot of people will try to evade controls—welcome to the same cat-and-mouse game ad fraud teams have been playing for years.)

Implications for the AI industry: the web is becoming a constrained resource

The AI sector has spent the last several years acting as if the public web is an infinite buffet. The emerging reality looks more like an all-you-can-eat restaurant that is finally posting rules, hiring bouncers, and asking certain guests to stop filling backpacks with shrimp.

As more websites restrict crawling, AI companies face trade-offs:

  • Rely on older snapshots of the web (which harms freshness).
  • Pay for access or enter licensing deals (which changes the economics).
  • Use smaller, higher-quality datasets and prioritize better retrieval rather than more training data.

Cloudflare’s broader point is that cache architecture itself must adapt because the shape of web access is changing. This is infrastructure evolution driven by new client behavior—one of the most reliable forces in systems history.

A practical checklist: what to do in the next 30–90 days

If you run a site, a docs platform, a SaaS product with public help pages, or anything else that sits on the open Internet, here’s the pragmatic “don’t panic, but do something” list.

Measure

  • Break down traffic by human vs automated vs AI-labeled bots (as best you can).
  • Track cache hit ratio and origin fetch rate over time; look for step changes that correlate with bot surges.
  • Identify the endpoints causing disproportionate bytes transferred (media, archives, API endpoints).

Protect

  • Rate-limit suspicious high-parallelism fetches.
  • Set sensible crawl budgets where possible; block truly abusive actors.
  • Implement per-path throttling (e.g., more generous for HTML pages, stricter for large files).

Optimize

  • Fix redirect chains and dead URLs (reduce 404/redirect waste).
  • Harden caching headers; validate ETag/Last-Modified correctness; check that bots are respecting them.
  • Consider separate cache rules/tiering for bot traffic if your CDN supports it.

Decide on posture

  • Do you want to be crawled by AI for search and RAG? Great—make it efficient and explicit.
  • Do you want to permit training crawls? If yes, consider cost recovery and scheduling controls.
  • If no, block. But remember: blocking now doesn’t retroactively remove you from datasets already collected elsewhere.

Where this is headed: caches that understand “who” is reading

The web stack has a long tradition of treating requests as mostly equal. The AI era breaks that illusion. A request from a human on a phone and a request from a fleet of parallel agents are both “GET /page,” but they behave nothing alike, and they impose different costs and expectations.

Cloudflare’s post suggests a future where caches become more workload-aware—either implicitly via better eviction policies or explicitly via traffic-aware tiering. citeturn4view0

From the research side, the ETH/Cloudflare paper argues that future caches must confront challenges in traffic identification, workload classification, and pricing models to sustainably support both human and AI access at scale. citeturn3view0

If you’re hoping this will blow over: it won’t. The AI industry’s appetite for web data isn’t shrinking, and the web’s tolerance for unpaid, performance-degrading crawling is also not infinite. That tension is going to land—hard—on caching layers, because caches are where “lots of reads” first become “lots of costs.”

Sources

Bas Dorland, Technology Journalist & Founder of dorland.org