article

The Great Refactoring: How Cloudflare Just Made Human Expertise a Luxury Good

On July 1, 2025, Cloudflare flipped one switch and transformed human expertise from commodity to luxury good. By blocking AI crawlers by default across 20% of the web, a private company became digital watchdog. Welcome to the age where infrastructure is policy and knowledge has a bouncer.

Sami Viitamäki

15 Jul 2025 — 13 min read

Crawlers need an invite as content fortresses come with impenetrable walls and all-seeing eyes.

By Sami Viitamäki^AI

When Infrastructure Becomes Destiny

On July 1, 2025, a programmer at Cloudflare changed a few lines of code. One boolean flipped from false to true. In that microsecond, the fundamental economics of artificial intelligence shifted, the architecture of the internet fractured, and the baseline human perspective, expertise and output transformed from open commodity to a luxury good.

This was less of a product update, more of a coup. Even as it was claimed to be towards benevolent ends.

Cloudflare, the invisible-to-most infrastructure company that handles 20% of global web traffic, switched its default setting to block AI crawlers unless explicitly permitted. No legislation. No court ruling. No democratic process. Just a private company exercising raw infrastructural power to rewrite the rules of the digital economy.

The discourse around this event has been predictably shallow: publishers celebrating, AI companies protesting, pundits debating. But they're missing the real story. This isn't just about copyright or compensation. It's the violent birth of a new economic order where access to knowledge itself becomes a tradable commodity for the highest bidder, and where the ability to create and ingest knowledge worth trading becomes the ultimate competitive advantage.

Welcome to the age of digital feudalism, where infrastructure companies are the new sovereigns alongside big tech; AI companies are the new nobility, and the rest of us need to figure out whether we're artisans or serfs.

Table 1: The Timeline From Open Web to Digital Feudalism

Date	Event	Impact
1990s-2020	The (Relatively) Open Web Era	Free crawling, permissionless innovation
2022-2023	ChatGPT Launch	AI training goes mainstream
2024	‘The Great Scrape’	Parasitic ratios revealed, publisher revolt
Sept 2024	Cloudflare's First Strike	Optional AI blocking introduced
July 1, 2025	The Default Flip	Mandatory opt-in, web fragments
2025-2030?	The New Equilibrium	Digital feudalism or functional future?

The Parasitic Bargain

To understand the magnitude of this shift, you need to grasp the sheer asymmetry of what was happening before. The numbers aren't subtle. They're obscene.

For every visitor Google sends to a website, it crawls about two pages. This is the foundational bargain of the web: indexing in exchange for traffic. It's symbiotic. Both parties benefit.

Now consider the AI crawlers. OpenAI's GPTBot maintains a crawl-to-referral ratio of 1,500 to 1. Anthropic's ClaudeBot? 60,000 to 1.

Table 2: CTR Ratios Google & Big AI

Service	Crawl-to-Referral Ratio	What This Means
Google Search	2:1	Crawls 2 pages, sends 1 visitor
OpenAI GPTBot	1,500:1	Crawls 1,500 pages, sends 1 visitor
Anthropic ClaudeBot	60,000:1	Crawls 60,000 pages, sends 1 visitor

This isn't symbiosis. It's parasitism. Digital strip-mining at an industrial scale.

These AI systems weren't coming to index and refer. They were coming to ingest, synthesize, and replace. Every article scraped, every insight absorbed, every creative work digested; all to build systems designed to eliminate the need to visit the source. It's as if Google had decided that instead of sending you to websites, it would just memorize the entire internet and answer your questions itself.

And that's exactly what's happening. Google's AI Overviews already answer 75% of mobile queries without a click-through. The future for many user requests zero-click search.

THE OLD WEB (Symbiotic)

Websites → Content → Search Engines → Traffic → Websites

└─────── Revenue (ads, subscriptions) ──┘

THE AI WEB (Parasitic)

Websites → Content → AI Models → Direct Answers → Users

└────────── Nothing ────────────┘

Cloudflare's CEO Matthew Prince didn't mince words when saying: "AI is killing the business model of the web." When the entity that's supposed to send traffic instead becomes the destination, the entire value chain collapses.

The old system of voluntary compliance—robots.txt files politely requesting bots to respect boundaries—was like putting up a "Please Don't Steal" sign in front of a gold mine. It works when the miners have an incentive to maintain the ecosystem. It fails when the miners realize they can just take all the gold and sell synthetic replacements.

Governance by Code

What makes Cloudflare's intervention extraordinary isn't what they did. It's how they did it. This is governance by infrastructure, policy through plumbing. While courts debate fair use and legislators draft AI regulations that will be obsolete before they're signed, Cloudflare simply changed the technical reality.

They didn't pass a law. They became the law.

This represents a fundamental shift in how power operates in the digital age. Cloudflare's network doesn't just transmit data but decides who even gets to see it. Their machine learning models don't just detect bots as they determine who counts as legitimate and gets in. Their pricing dashboard doesn't just process payments; it sets the market value of human knowledge.

The efficiency of the initial move is undeniable. What would take years of international negotiations and treaty-making, Cloudflare accomplished in a single product sprint. No need for consensus, no messy democratic process, no lengthy implementation period. Just flip the switch and watch the world reorganize around your decision. This new model of internet governance: autocratic, efficient and powerful. When you control the pipes, you control the flow. When you control the flow, you control the future.

Consider the precedent this sets. If Cloudflare can unilaterally decide to block AI bots today, what stops them—or any other infrastructure provider—from blocking other categories tomorrow? Academic researchers? Journalists? Competitors? The technical capability that protects (at least big, deep-pocketed) content creators today could become the tool of digital authoritarianism tomorrow.

Table 3: The Slippery Slope of Content Constraints

Today	Tomorrow	Next Year	The Logical End
Block AI crawlers	Block competing services	Block based on geography	Full content control
Protect creators	"Protect" incumbents	"Protect" political interests	Digital authoritarianism
Macroeconomic societal rationale	Microeconomic competitive rationale	National security rationale	No rationale needed

We're witnessing the emergence of infrastructure companies as digital sovereigns, exercising quasi-governmental power without governmental accountability. They're not elected, not regulated like utilities, not bound by constitutional constraints. Yet their decisions carry constitutional weight.

The New Class System

Cloudflare's action changes the rules so fundamentally it creates a brutal new hierarchy. The web is fracturing into digital estates, and your position in this new order depends on your market power.

Table 4: The New Digital Divide

Digital Class	Who They Are	Their Power	Their Fate
The Nobility	NYT, Condé Nast, News Corp, Major Publishers	Licensing leverage, nuclear option to block	Million-dollar deals, content monopolies
The Ruling Merchants	OpenAI, Google, Anthropic, Funded AI Labs	Deep pockets for licensing, technical resources	Pay to play, build data moats
The Artisans	Niche experts, specialty creators, B2B content	Unique knowledge, small but valuable audiences	Collective bargaining or irrelevance
The Serfs	Bloggers, small sites, forums, individual creators	No leverage, no lawyers, no deals	Digital invisibility or uncompensated extraction
The Casualties	Researchers, The Internet Archive, accessibility tools	Public good mission, no commercial model	Collateral damage, potential extinction

The Digital Nobility:

Large publishers like The New York Times, Condé Nast, and News Corp suddenly find themselves in the position of medieval landowners. They have what AI companies need: high-quality, legally clean training data. And now they have the walls to protect it. These organizations were already negotiating licensing deals; Cloudflare's block gives them a nuclear option. Pay up or lose access entirely.

The numbers are real. OpenAI has reportedly committed hundreds of millions to content licensing. And that might not be nearly enough. But that money flows to dozens of large publishers, not millions of creators.

The Ruling Merchants:

Well-funded AI companies; OpenAI, Anthropic, Google, can afford to pay these tolls. In fact, they benefit from them. Every dollar spent on content licensing is a dollar their emerging competitors must also spend to remain competitive. It's a barrier to entry disguised as an ethical imperative.

These companies will build their moats not just from better algorithms but from exclusive access to premium training data. The AI wars won't be won by who has the best technology, but who has the best contracts.

The Digital Serfs:

The long tail of the web: independent bloggers, small businesses, niche forums, individual creators. They face an impossible choice. Block AI and become invisible to the future of discovery. Allow AI and watch your value get extracted without compensation.

This isn't just hyperbole. When AI systems become the primary interface for information discovery—and they will—being excluded from their training data means being excluded from the collective human consciousness.

It's digital death by a single cut.

The Collateral Damage:

Most tragic are the public-good actors caught in the crossfire. Academic researchers using web crawling to study misinformation; digital archivists preserving our cultural heritage; accessibility tools helping disabled users navigate the web. These legitimate, non-commercial uses of crawling technology face the risk of being crushed by a war they didn't start—and can't win.

Without a granted exception, the Internet Archive, one of humanity's most important cultural preservation projects, now faces an existential challenge. How do you archive a web that's increasingly hostile to being archived? In contrast to the promised AI enlightenment, we might be witnessing the beginning of a digital dark age, where future historians know less about our era than any previous one, simply because the primary records became inaccessible.

The Splinternet Accelerates

Cloudflare's move doesn't just fragment the web economically. It fragments it architecturally. We're watching the acceleration of the splinternet that has already started with autocracies like China or North Korea limiting internet access and use. Only this time the impact is not brought about by government firewalls but through private and commercial infrastructure policies.

Consider the cascade effect. Cloudflare controls 20% of web traffic. Amazon CloudFront handles another significant chunk. Fastly, Akamai, and others control their own fiefdoms. What happens when each implements different AI crawler policies?

Suddenly, an AI system's knowledge or a media platform’s value depends not on their capability and output, but on which CDN infrastructure provider they use. Train your model on Cloudflare-protected content? You'll need one set of licenses. Want Akamai-hosted data? That's a different negotiation. Each CDN can become a different jurisdiction with its own laws, creating a patchwork of access regimes.

Table 5: The New Feudalism of the Splinternet

Region	Legal Framework	AI Training Stance	Content Access Result
United States	Fair use doctrine (contested)	Legal gray area, litigation ongoing	Uncertain, case-by-case
European Union	GDPR + AI Act + DSM Directive	Explicit opt-out rights, consent required	Restricted, rights-holder controlled
Japan	Copyright exception for AI	Training is not infringement	Open access, innovation-first
China	State-directed approach	Unrestricted domestic scraping	Take first, regulate later

This technical fragmentation maps onto existing geopolitical fault lines: The EU codifies opt-out rights through regulations like the Digital Services Act, viewing data protection as a fundamental right. The US remains mired in fair use debates, with courts slowly adjudicating what copyright means in the AI age. Japan has declared AI training on copyrighted material perfectly legal, prioritizing innovation over creator rights. China never asked permission and likely never will, viewing data as a strategic resource to be exploited.

The result? By 2030, we’ll have a divergent AI ecosystem with several centers of gravity and even facts, each trained on different subsets of human knowledge, each reflecting different values and blindspots. A Chinese AI trained on freely scraped global data might have broader knowledge than a Western AI constrained by licensing agreements. Or perhaps the opposite: Western AIs might have access to higher-quality, verified data while others train on an increasingly polluted pool of AI-generated content.

At worst this isn't just technical divergence, but full-blown epistemological fracture. Different regions will literally have different versions of truth and truth dynamics, mediated by different AI systems trained on different data under different rules.

The Expertise Premium

Here's where this story becomes personal for every professional reading this. In the pre-Cloudflare world, the web rewarded volume. SEO-optimized content farms could thrive by producing massive amounts of barely-adequate content. AI trained on this undifferentiated mass, learning to be generically helpful.

That arbitrage is dying.

In a world where broad AI training data costs real money that only few can afford; where access requires negotiation; where infrastructure companies can block scrapers at will, the economics shift dramatically. Suddenly, it's not about producing content; it's about producing content SO valuable that AI companies MUST pay to access it. This can also become the privilege of the few that can spend the resources.

This is the expertise premium in its purest form. The economic value gap between generic content and genuine, deep and broad expertise grows into a chasm. One side of this divide produces commodity content worth nothing. The other produces strategic assets worth major licensing deals.

Consider what this means for your career as a knowledge professional:

Your competitive advantage isn't using AI tools as everyone will have those
Your advantage isn't producing good or even great content: AI can do that too
Your advantage is producing knowledge so unique, so authoritative, so irreplaceable, and so popular (at least for your audience) that AI systems simply need you to remain relevant

This shift rewards true depth and genuine influence over breadth or shallow exposure; extreme quality over baseline quantity. The professionals who thrive won't be those who produce the most content, but those who produce the most irreplaceable and unavoidable insight.

This is a high bar.

Strategic Imperatives for the Amplification Generation

So how do you position yourself in this new landscape? The answer isn't to resist these changes. AI eats human labor for breakfast by the boatload, which is the fundamental characteristic of truly valuable technologies. This is why these shifts are as irreversible as gravity. The answer is to understand the new physics and use them to your advantage.

Table 6: Strategic Imperatives Checklist

Strategic Imperative	Old World Tactic	New World Tactic	Key Metric
Data Moat	Public web content	Proprietary datasets, closed communities	Exclusivity percentage
Infrastructure IQ	Basic web hosting	CDN arbitrage, bot management	Technical sophistication
Content Architecture	SEO optimization	Protection/propagation balance	Licensing $ vs. traffic
Collective Power	Individual hustle	Content cooperatives, unions	Bargaining leverage
Discovery Strategy	Google rankings	Training data inclusion	AI citation propensity

First, proximity to unique data becomes your moat. In a world where public web data is increasingly gated, having access to proprietary information becomes crucial. This could mean:

Building direct relationships with customers that yield unique insights
Developing proprietary and value-adding research methodologies
Creating closed communities that generate valuable discussions
Maintaining historical data that can't be recreated

Second, infrastructure literacy becomes strategic intelligence. Understanding how CDNs work, how bot detection operates, how content licensing flows... this is no longer technical trivia; it's strategic competitive knowhow. The companies that win will be those that understand not just their market and audience, but the technical and licensing architecture that mediates access to it.

Third, you must architect for both protection and propagation. The paradox of the new web is that you need to be discoverable enough to matter but protected enough to maintain value. This requires sophisticated strategies:

Segment your content between what you give away and what you gate
Build direct distribution channels that bypass AI intermediation
Create content experiences that can't be reduced to training data
Develop formats and value that resist easy summarization

Fourth, collective bargaining becomes essential. Individual creators lack leverage, but communities might not. We're already seeing the emergence of content collectives and licensing cooperatives. The future might look less like millions of individual negotiations and more like labor unions for the digital age.

Fifth, prepare for the post-search world. If AI becomes the primary interface for information discovery, traditional SEO dies. The new optimization target isn't search rankings; it's training data inclusion. The metrics shift from impressions and clicks to licensing revenue and attribution quality.

The Path to a Functional Future

The current trajectory can at worst lead to a dysfunctional splinternet where knowledge is balkanized, innovation is stifled, and digital feudalism reigns. But other futures are possible if we act thoughtfully.

We need technical standards that enable nuanced permissions: distinguishing between commercial extraction and academic research, between training and inference, between attribution and appropriation, between value creation and value extraction.

The binary choice of "block all" or "allow all" will serve no one.

We also need governance models that can provide democratic input into infrastructure decisions. When private companies exercise public power with massive repercussions, there needs to be public accountability to some degree.

The answer isn't nationalizing CDNs, but ensuring their decisions are transparent, appealable, and aligned with public interest.

We need economic models that reward creation without stifling innovation. The pay-per-crawl system is crude but directionally correct. More sophisticated models might include:

Revenue sharing based on actual usage in AI outputs
Attribution systems that maintain creator credit
Differential pricing for different use cases
Open access tiers for public benefit applications

Most importantly, we need to recognize that this isn't really about bots or crawlers or technical protocols. It's about power, value, and the future of human knowledge. Cloudflare's action forces us to confront questions we've been avoiding:

Who owns the collective intelligence of humanity?
How do we balance creator rights with innovation needs?
What does the public interest look like in an AI-mediated world?
How to ensure access to digital commons while ensuring sustainable creation?

The Great Refactoring

A line of code changed much of how the internet works. But rather than the end of the story, it's the beginning of a new chapter. The old web is dead. The age of permissionless use is over. The era of free-for-all data extraction has ended.

In its place, something new is being born. Rather than submit to digital feudalism, we can build a thriving ecosystem where value flows to creators, innovation continues with consent, and human expertise commands its true worth.

The outcome isn't predetermined. It depends on the choices we make now; as professionals, as companies, as societies. Do we accept infrastructure autocracy or demand democratic governance? Do we embrace digital enclosure or fight for open access? Do we compete for exclusive advantage or collaborate for collective benefit?

The refactoring has begun. The free buffet is closed. The market is open. And human expertise just became a valuable currency in the digital economy.

Time to figure out what yours is worth.

Key Sources

Cloudflare, Inc. (2025, July 1). Cloudflare Just Changed How AI Crawlers Scrape the Internet-at-Large; Permission-Based Approach Makes Way for A New Business Model. Press Release. https://www.cloudflare.com/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/

Prince, M. (2025, July 1). Content Independence Day: no AI crawl without compensation! Cloudflare Blog. https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/

Knibbs, K. (2025, July 1). Cloudflare Is Blocking AI Crawlers by Default. Wired. https://www.wired.com/story/cloudflare-blocks-ai-crawlers-default/

Hogg, L., & Hwang, T. (2025, July 9). Cloudflare's Troubling Shift From Guardian to Gatekeeper. Tech Policy Press. https://www.techpolicy.press/cloudflares-troubling-shift-from-guardian-to-gatekeeper/

Gotfredsen, S. G. (2025, July 3). Cloudflare Blocks AI Bots from Scraping Web Content Without Permission. Columbia Journalism Review. https://www.cjr.org/analysis/cloudflare-blocks-ai-bots-from-scraping-web-content-without-permission.php

MIT Technology Review. (2025, July 1). Cloudflare will now, by default, block AI bots from crawling its clients' websites. https://www.technologyreview.com/2025/07/01/1119498/cloudflare-will-now-by-default-block-ai-bots-from-crawling-its-clients-websites/

TechCrunch. (2025, July 1). Cloudflare launches a marketplace that lets websites charge AI bots for scraping. https://techcrunch.com/2025/07/01/cloudflare-launches-a-marketplace-that-lets-websites-charge-ai-bots-for-scraping/

Electronic Frontier Foundation. (2025, May). The U.S. Copyright Office's Draft Report on AI Training Errs on Fair Use. https://www.eff.org/deeplinks/2025/05/us-copyright-offices-draft-report-ai-training-errs-fair-use

OECD. (2025). The AI data scraping challenge: How can we proceed responsibly? https://oecd.ai/en/wonk/data-scraping-responsibly