Data Access Points in the Contentstack SaaS Headless CMS

This blog post contains information about techniques that you can use to access data in the Contentstack SaaS headless CMS. First, the Content Management API and Webhooks sections describe facilities not intended for content delivery environments. The remaining sections describe Contentstack content delivery services, also called APIs.

Content Management API

The content management API is not intended for content delivery, but for importing, publishing, and otherwise manipulating data in the CMS. You can use the content management API for read and write access to the entire content management system, including all versions of work in progress and published content. The content management API uses security techniques that differ from those used for content delivery. Content management APIs may not provide the level of endpoint caching provided by the content delivery API.

Use the content management API to import content, automate processes, and otherwise integrate and extend the CMS.

Webhooks

You can configure the CMS to invoke URLs in external systems when events occur. Your webhook listeners parse a JSON payload that contains information about an event in the CMS. For example, you can configure webhooks around publishing to trigger cache clearing, content synchronization, search index updates, or other processing in your systems.

While publishing webhooks pass content in the JSON payload, I prefer to use them for signaling only, using the synchronization API to retrieve the data. For example, rather than taking action on every publishing event, use the final publishing webhook (or better yet, a Contentstack release webhook) to trigger a synchronization.

Use webhooks to signal external systems but use the synchronization API when possible.

Content Delivery API

The content delivery API exposes the published textual content as JSON, including the schemas of the content types, all field values of all entries, and the metadata of all assets, including their URLs and folders. Contentstack manages a Content Delivery Network (CDN) for the data exposed by the content delivery API. Access to any content delivery API requires a stack identifier, a security token, and the name of a content delivery environment, such as staging or published.

You can use the content delivery API at various points in the solution lifecycle, such as the build and deployment process for a website or mobile device, an application running in an environment that you control, and applications running in browsers, on mobile device, or otherwise. With the content management API, a main consideration is which APIs to use for what data in each phase of the application and its build process.

The content delivery API, unless explicitly restricted, returns all field values for every entry, making it convenient by default.

Most solutions use the content delivery API for content delivery. Some solutions use application services to generate HTML dynamically at runtime; others rely on static HTML exports and client technologies including Jamstack and mobile device APIs.

When you use the content delivery API to access values of file fields, Contentstack embeds the metadata of referenced assets into the representations of entries that reference those assets. You can pass parameters to the API to cause the same type of expansion for entries specified in reference fields.

The content delivery API requires queries individual content types separately.

Synchronization API

The synchronization API optimizes access to large volumes of data from the content delivery environment. You can use the synchronization API to request data modified since a point in time or since the beginning of recorded time for a stack. You can then poll or otherwise trigger synchronization, which returns any data modified subsequently.

Rather than potentially embedding the same asset metadata in numerous entries, the synchronization API separates data for assets and entries, and supports parameters to control whether the API returns content, assets, both, or neither (deletion and potentially other events).

To increase reliability and reduce the impact of large operations on processing infrastructure, all APIs require paging when exceeding certain operational thresholds. While this issue is relevant to content delivery and all other APIs, it applies particularly to the synchronization APIs, as most organizations have more than the default 100 entries per page.

Use the synchronization API to export and synchronize data in external systems, such as to build or rebuild a search index, synchronize a database, publish HTML or JSON files, or otherwise.

GraphQL

The GraphQL API accesses a content delivery environment.

The GraphQL API allows querying entries from multiple content types simultaneously.

Contentstack provides a browser-based user interface for defining GraphQL queries.

GraphQL explicitly specifies fields to retrieve, including asset metadata, which can be relatively verbose, but is efficient by default.

GraphQL queries typically specify individual content type identifiers, meaning querying code likely has some knowledge of the content types and their structures, although content type schema data is available when appropriate.

Just as all APIs are, GraphQL queries are somewhat vendor-specific, meaning migration from one CMS to another may require refactoring queries rather than just calling code.

Image API

The name image API may be bit misleading, as the content delivery APIs expose metadata about assets, including their URLs and alternate text, and the image API does not. The image API is simply the URLs of media assets in Contentstack, including query string parameters used primarily to manipulate images on Contentstack’s servers before caching them to its CDN, such as to apply optimizations, cropping, or otherwise. The media API does not require the security HTTP headers used by the content delivery APIs.

Search

While the content delivery API provides query and search facilities, to support features such as word stemming and hit highlighting, most web solutions involve a search engine. In some cases, it may be possible to store raw data from the content management system in search indexes, allowing retrieval not only of the search API’s representation of the data, but the raw JSON representation of an entry from Contentstack. Alternatively, entries from Contentstack can be flattened into a search index. Webhooks, polling, manual, and other techniques can trigger synchronization to update or replace any number of search indexes.

Using an external search engine involves separate hosting and implementation costs.

Where the content delivery APIs require querying each content type separately, the synchronization APIs provide data about entries of all content types, and GraphQL requires explicit specification of content types, a search operation accesses entries of all content types indexed and can easily filter results further by content type, search and query criteria, or other logic.

Accessing data directly from the search solution reduces knowledge requirements for front-end developers that can just as easily access the data through the search APIs that they use for other purposes.

Using a search index for content delivery maintains a runtime dependence on the search engine, but relative to using content delivery APIs, removes a runtime dependence on the content management system.

You can apply any logic to the data that you index, such as merging data from multiple systems, pre-rendering HTML fragments or even entire pages, and otherwise.

Think of the CMS as a UI for editing the JSON, abstracting the document database, and search as the JSON delivery mechanism supporting complex queries and searches. Using a search index for content delivery, particularly with a broker or other abstraction layer to control interactions between systems, facilitates upgrades and migration between systems. If all read-only systems access content delivery and other systems through a service broker that abstracts search rather than directly, then replacing the search engine requires only replacement of the service broker rather than changes to all other applications and replacing any of the other systems affects only implementation of the service broker API for the new system, including both indexing and access.

Links

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: