How to automatically serve JSON for any HTML page using html2json
This demo showcases how you can automatically provide a JSON representation of any HTML page on your site by integrating the html2json service into your CDN worker.
Instead of manually maintaining separate JSON endpoints, this approach returns structured JSON on the fly from standard EDS pages. This enables headless consumption of any page without additional authoring effort.
How it works
- There is an OOTB service https://mhast-html-to-json.adobeaem.workers.dev that allows us to get JSON out of any EDS page. This can be used standalone as shown in the examples below or wired into your CDN.
- To wire it into your CDN so you can just hit .json on any page, you would need to update the CDN worker (CDN Worker Integration example after this section)
Examples
- Full page (head + body): https://mhast-html-to-json.adobeaem.workers.dev/scdemos/demo/
- Content only (no head): https://mhast-html-to-json.adobeaem.workers.dev/scdemos/demo/?head=false
- Preview (aem.page): https://mhast-html-to-json.adobeaem.workers.dev/scdemos/demo/?preview=true
Query Parameters
Parameter
Values
Description
head
true (default) / false
Include or exclude the <head> metadata from the JSON output
preview
true / false (default)
Fetch from aem.page (preview) instead of aem.live (production)
compact
true / false (default)
Return a compact representation of the body content
CDN Worker Integration
To enable html2json on your site rather than use the service urls above, add the following two snippets to your CDN worker:
1. URL builder function (add near the top of your worker):
- Define the allowed query parameters: head, preview, compact
- Build the html2json service URL by replacing the .json extension with the page path and forwarding only the allowed query parameters.
- Update the base URL to point to your org/repo: https://mhast-html-to-json.adobeaem.workers.dev/{org}/{repo}{pagePath}
2. Request handler snippet (add in your handleRequest function):
- When a GET request for a .json file returns a 404 from the origin, intercept it and fetch from the html2json service instead.
- The response is cached at the edge using Cloudflare's cacheEverything for performance.
- If the html2json service returns a valid response, it replaces the 404.
Code Snippets
URL Builder Function:
...
// html2json - start
const HTML2JSON_QUERY_PARAMS = new Set(['head', 'preview', 'compact']);
const buildHTML2JSONURL = (requestURL) => {
const pagePath = requestURL.pathname.replace(/\.json$/, '');
const html2jsonURL = new URL(
`https://mhast-html-to-json.adobeaem.workers.dev/scdemos/demo${pagePath}`,
);
for (const [key, value] of requestURL.searchParams.entries()) {
if (HTML2JSON_QUERY_PARAMS.has(key)) {
html2jsonURL.searchParams.append(key, value);
}
}
return html2jsonURL;
};
// html2json - end
const handleRequest = async (request, env, ctx) => {
...
handleRequest Snippet:
const handleRequest = async (request, env, ctx) => {
const requestURL = new URL(request.url);
const url = new URL(request.url);
...
let resp = await fetch(req, {
method: req.method,
cf: {
// cf doesn't cache html by default: need to override the default behavior
cacheEverything: true,
},
});
// html2json - start
if (request.method === 'GET' && extension === 'json' && resp.status === 404) {
const html2jsonResp = await fetch(buildHTML2JSONURL(requestURL), {
headers: {
accept: 'application/json',
},
cf: {
cacheEverything: true,
},
});
if (html2jsonResp.ok) {
resp = html2jsonResp;
}
}
// html2json - end
resp = new Response(resp.body, resp);
...