How to automatically serve JSON for any HTML page using html2json

This demo showcases how you can automatically provide a JSON representation of any HTML page on your site by integrating the html2json service into your CDN worker.

Instead of manually maintaining separate JSON endpoints, this approach returns structured JSON on the fly from standard EDS pages. This enables headless consumption of any page without additional authoring effort.

How it works

Examples

Query Parameters

Parameter
Values
Description
head
true (default) / false
Include or exclude the <head> metadata from the JSON output
preview
true / false (default)
Fetch from aem.page (preview) instead of aem.live (production)
compact
true / false (default)
Return a compact representation of the body content

CDN Worker Integration

To enable html2json on your site rather than use the service urls above, add the following two snippets to your CDN worker:

1. URL builder function (add near the top of your worker):

2. Request handler snippet (add in your handleRequest function):

Code Snippets

URL Builder Function:

...
// html2json - start
const HTML2JSON_QUERY_PARAMS = new Set(['head', 'preview', 'compact']);
const buildHTML2JSONURL = (requestURL) => {
  const pagePath = requestURL.pathname.replace(/\.json$/, '');
  const html2jsonURL = new URL(
    `https://mhast-html-to-json.adobeaem.workers.dev/scdemos/demo${pagePath}`,
  );
  for (const [key, value] of requestURL.searchParams.entries()) {
    if (HTML2JSON_QUERY_PARAMS.has(key)) {
      html2jsonURL.searchParams.append(key, value);
    }
  }
  return html2jsonURL;
};
// html2json - end

const handleRequest = async (request, env, ctx) => {
...

handleRequest Snippet:

const handleRequest = async (request, env, ctx) => {
  const requestURL = new URL(request.url);
  const url = new URL(request.url);
 ...
  let resp = await fetch(req, {
    method: req.method,
    cf: {
      // cf doesn't cache html by default: need to override the default behavior
      cacheEverything: true,
    },
  });

  // html2json - start
  if (request.method === 'GET' && extension === 'json' && resp.status === 404) {
    const html2jsonResp = await fetch(buildHTML2JSONURL(requestURL), {
      headers: {
        accept: 'application/json',
      },
      cf: {
        cacheEverything: true,
      },
    });

    if (html2jsonResp.ok) {
      resp = html2jsonResp;
    }
  }
  // html2json - end

  resp = new Response(resp.body, resp);
 ...