Export Data from the Contentstack SaaS Headless CMS to Files

This blog post describes some considerations for exporting data from Contentstack to a file system including both JSON and binary media. I have already implemented a static site generator for .NET and intend to extend that to generate JSON and media files described in this post as time permits (and especially if anyone shows any interest or volunteers to help!), which would likely lead to additional considerations.

Jamstack solutions typically deploy static files to a content delivery network (CDN). When a browser requests a page, the CDN serves the corresponding file with no additional processing.

Jamstack solutions that use headless CMS often use static site generation tools to create the files to deploy. Static site generators often use JavaScript frameworks such as React and Vue, but can use any technology for static HTML generation (I prefer ASP.NET Core Razor Pages).

Jamstack solutions often use client-side JavaScript to retrieve additional data at runtime, such as to consume JSON from the CMS. This allows for dynamic features and content updates without the need to rebuild and redeploy the site, and suits client-side rendering.

Typically, the client requests JSON directly from the CMS. This approach has some disadvantages, such as exposing the CMS in use and its security tokens, but also tightly the content delivery solution to vendor-specific aspects of the CMS, which is against the objectives of headless systems.

Alternatives are possible, such as having content delivery access a system that contains a shadow of data in the CMS rather than accessing it directly. In selecting an approach, consider the need for frequency and immediacy in publishing and its impact on the solution.

To allow client-side rendering without the need for dynamic data from the CMS, solutions maybe able embed some or all the required JSON in the HTML served in the first response to the visitor. Alternatively, solutions can export the JSON, such as to the same file system used by the CDN hosting the website. Solutions that require dynamic data from the CMS may be able to use search indexes to store the JSON.

When exporting a small volume of JSON to files, the directory structure need not be complex. Some solutions may store all .json files for each content type in separate folders. A larger number of entries requires a more complex solution, such as a directory structure based on the URL paths of the entries.

Unfortunately, not all entries have URLs. All entries do have entry identifiers that are globally unique. For very large numbers of entries, create file system paths based on entry identifiers. For example, put the entry with identifier 123456789 in a file of a subdirectory such as /1/2/3/123456789.json. If that entry has the URL /path/to/page, then place the same file in a subdirectory such as /path/to/page.json. In this way, JavaScript can request the JSON for the entry based on its URL or its identifier.

The stored JSON should be vendor-neutral and optimized for content delivery, such as inlining data from referenced and embedded entries and assets. In creating directory paths, ignore the blt prefix in Contentstack entry identifiers and place subdirectories under a primary directory such as /cms under the document root of the CDN. Implement a JavaScript library to centralize retrieval of JSON from the CDN by URL path or entry ID.

Assets

Exporting assets involves exporting both metadata and binary data for the media.

By default, the solution will use the URLs of media assets hosted on Contentstack’s content delivery network. You may choose to export text content but let Contentstack host binary content.

Apply any required transformations, such as optimizations, when generating static files from images.

Consider creating a directory structure based on the media folder structure, which is only available through the content management API and not the content delivery API. Otherwise, consider creating a directory structure based on asset identifiers, but under /assets or /cms/assets rather than directly under the document root.

When retrieving data from file fields of entries, map Contentstack asset URLs to file paths.

Consider rewriting references to Contentstack asset URLs in markup and other fields, preferably before storage as JSON.

One thought on “Export Data from the Contentstack SaaS Headless CMS to Files

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: