How to Deploy a Static Site to Cloudflare Pages Without the Gotchas (sitemap, robots.txt, and WASM headers)
Deploying a static site to Cloudflare Pages? Keep Google from silently skipping your pages (robots.txt/sitemap at the root), set the CORS/COEP headers WASM needs via the _headers file, and learn what to configure on day one.
More in Guides
- Your Claude Code Skill Won't Trigger? The Description Is Doing 90% of the Work
- Automating a Monthly Meal Planner with OpenClaw Cron — A Beginner's Step-by-Step
- Why one benchmark won't tell you the best coding LLM in 2026 — and which three together actually do
- Designing Frontends Claude Can Actually Use — A 7-Step Field Guide From the Day My Scoring App Got Audited by Its Own AI
- Stop AI from Fabricating Research Citations: A Build-Pipeline Checklist
You push a static site to Cloudflare Pages, it deploys on every push, and everything looks fine — until weeks later you notice Google never indexed half your pages, or a WASM demo that worked locally throws errors in production. Cloudflare Pages is genuinely excellent static hosting: free at the small scale a side project runs at, automatic deploys from GitHub pushes, global CDN built in. But its conventions are specific, and the gotchas hide in places you won't think to check until they bite.
Real example: these are the conventions that cost me more than a few failed deployments and a Google Search Console indexing incident while shipping VORA, a static meeting-transcription app. Here's each gotcha and how to avoid it.
Why Cloudflare Pages suits a pure static site
If your site is pure static files — HTML, CSS, JavaScript with no build step, no server, no database — it's an ideal candidate for static hosting. The usual alternatives are GitHub Pages, Netlify, and Vercel. The one reason to reach for Cloudflare Pages specifically: its global CDN runs on Cloudflare's edge network, which has exceptionally low latency in Asia-Pacific. If your primary users are in Korea and Japan, the edge performance in Seoul and Tokyo is measurably better than competitors.
Setup is straightforward: connect the GitHub repository, set the root directory, set the build output directory (or leave it empty for static sites with no build step). Push to main, and it deploys automatically. No configuration required for the happy path.
The trouble starts off the happy path.
Put sitemap.xml and robots.txt at the deployment root — or Google can't find them
Here's the gotcha that does the most quiet damage. robots.txt must be served from the exact path https://yourdomain.com/robots.txt — not from any subdirectory. Googlebot explicitly looks only at the root. Think of it like putting your apartment's doorbell on the third floor instead of at the front entrance: technically it's there, but nobody will ever find it. Similarly, sitemap.xml is expected at https://yourdomain.com/sitemap.xml by convention (though the actual path can be specified in robots.txt).
So if you follow normal file-organization instinct and tuck these into a subdirectory, Google Search Console reports the sitemap inaccessible and robots.txt not found.
Real example: I placed them in a subdirectory at first. Google Search Console reported both missing. Several weeks of indexing were potentially affected. Moving the files to the repository root (which becomes the Cloudflare Pages deployment root) immediately fixed it and Search Console validation went green — but the delay meant some blog posts weren't indexed for 2-3 weeks after publishing.
The rule — Cloudflare Pages root behavior: the repository root is the deployment root. Every file you want served at a specific URL path must live at the corresponding path in the repository. sitemap.xml in the repo root → served at yourdomain.com/sitemap.xml. There is no "public" directory or "dist" directory by default.
Set CORS/COEP headers for WASM with the _headers file
If you run multi-threaded WebAssembly (e.g. ONNX Runtime), you need specific HTTP response headers: Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp. These are necessary for SharedArrayBuffer, which ONNX Runtime needs for multi-threaded WASM execution.
On a normal web server you'd add these in your nginx or Apache config. On Cloudflare Pages, you configure custom headers through a special file called _headers placed in the deployment root. The format:
# _headers file (Cloudflare Pages)
/vad-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
/hybrid-asr-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
/sherpa-onnx-test.html
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corpThis works — but only for the specific pages listed. And it surfaces a deeper conflict: any page that loads cross-origin resources (Google Fonts, Font Awesome CDN, Google AdSense) and also needs COEP headers faces a fundamental clash. COEP requires all subresources to also set Cross-Origin-Resource-Policy: cross-origin, and Google's CDN resources do not set this header. The result is binary: either disable COEP (and lose SharedArrayBuffer) or lose the external CDN resources (and lose fonts and icons).
Real example: for the WASM-heavy lab pages I vendored the ONNX Runtime library locally (into the repo's
lib/directory) and accepted that those pages wouldn't load Google Fonts or Font Awesome. That's why the lab experiment pages have slightly different typography from the main site — functionality chosen over visual consistency, scoped to the experiment pages.
Mind the local-vs-production header gap (the server.py you'll see in some repos)
A subtle trap: the cross-origin isolation headers that _headers injects in production don't exist when you open an HTML file directly or serve it with a basic python3 -m http.server locally — so WASM threading breaks only in local testing. The common workaround is a tiny local dev server that injects the same CORS/COEP headers.
Real example: the VORA repo contains a
server.py— a Python HTTP server with CORS and COEP headers injected, purely for local development of the WASM lab pages. In production (Cloudflare Pages) the_headersfile handles it;server.pyis purely a developer-experience tool, kept with a comment at the top saying it's for local development only.
What to configure on day one (instead of discovering it later)
If you're starting a Cloudflare Pages project today, set these up from the beginning rather than reacting later:
- Custom domain from day one. A project-named subdomain (like
vora.vibed-lab.com) is functional but presents the old/working name permanently; a custom domain decouples the URL from the project name. _headersfile from day one. Adding CORS headers reactively, only when WASM experiments need them, creates a confusing period where some pages work and others don't.sitemap.xmlandrobots.txtin the root from day one. Never put these anywhere except the deployment root — there is no valid reason to put them anywhere else.- Preview deployments for testing. Cloudflare Pages automatically creates preview deployments for non-main branches. Use them — they catch deployment-specific issues (like the robots.txt placement) without affecting production. Deploying directly to main for every change is how those issues reach users first.
Takeaway
Static hosting isn't complicated; the complications are in the details — where specific files must live, how security headers interact with CDN resources, how to configure builds that don't actually need building. These are solvable with a focused afternoon reading your hosting platform's actual documentation end to end — not YouTube tutorials, not Reddit threads. The gotchas are always in the footnotes, and the footnotes are where three weeks of indexing can quietly disappear.
If discoverability is your goal, pair this with the broader complete SEO guide — getting indexed is step one; getting ranked is the rest.
2026.02.21
Written by
Jay Lee
Korea-Licensed Pharmacist (#68652) · Senior Researcher
Korea University, College of Pharmacy (B.S. + M.S., drug delivery systems & industrial pharmacy). Building production-grade AI tools across medicine, finance, and productivity — without a CS degree. Domain expertise first, code second.
About the author →Related posts