In the context of a managing a website but likely in more general contexts, there are at least three common architectures for headless CMS:
- Browsers load static files from web server in content delivery tier that contain data exported from the CMS.
- Browsers load static files from web server in content delivery tier and consume content delivery APIs.
- Browsers load dynamically generated pages from application server in content delivery tier that consumes content delivery APIs.
Hybrid architectures are possible, such as serving generated files or dynamic pages that direct browsers to consume content delivery APIs.
An application server, such as ASP.NET Core, adds server-side intelligence to the web server between the browser and the CMS. Application servers are invaluable, with JavaScript options seemingly increasing in popularity. In general, the application server is a client to the headless CMS, invoking its REST APIs to consume JSON data, and a server to browsers, to which it sends HTML and other resources. In the case of webhooks, such as when the CMS publishes, the application server (arguably) functions as a server to the content management system, which technically invokes something on the application server.
Using an application server rather than static files suggests additional possible architectures, some of which may also apply without application servers, and some of which can be used with other techniques:
- Application server consumes synchronization APIs that expose data such as publication event history. This is generally appropriate when the content delivery tier caches (especially preloaded) data or otherwise processes large volumes of records, especially not within the context of a single page request (for example, as a batch or at application initialization).
- Content management server invokes webhooks that pass data to application servers, for example after publishing a record. This architecture is useful for example to update application search indexes and application server caches.
- Application server responds to webhooks by putting data on an enterprise message bus, event streaming platform, or other system; subscribing applications respond to events or messages. For example, a subscribing application may pass the data received in the event or message to a search engine for indexing. This architecture insulates systems from each other and can support transactions, such as processing a batch of events or messages as a transaction. Consider techniques to replay event and message streams to bring up new applications.
Most websites include search solutions. Search involves two major programming interfaces: indexing, which updates the search index, and querying/searching), which read from the search index.
For indexing, the publishing process in the content management system invokes webhooks on the application server. The application server passes the received data to the search indexing API.
For querying, the application server or browser invokes search APIs that return data identifying or representing records from the content management system. Developers might assume that they should use search APIs to query and content delivery to retrieve data. There is no need to access the content delivery API when using search; the search API can return the data indexed from the headless CMS in its raw JSON format. When accessing search, the application server or browser should retrieve all the data that it needs from the search index rather than retrieving record identifiers and then requesting those records from the content management system.
Even when not using functionality specific to search, developers may still want to use the search API to retrieve data indexed from the CMS rather than learning and accessing content delivery APIs directly, and especially rather than learning CMS vendor query and GraphQL specifics. The CMS content delivery API may be most appropriate for specific scenarios such as batches that process large numbers of records without query criteria or where the all the criteria needed are known (for example, the content type identifiers and IDs or URL paths, and potentially language and other identifiers).
Additionally, the search engine can return data indexed from multiple sources beyond CMS. In a typical scenario, along with the content management system, product information and inventory management systems feed one or more search indexes with records that all contain a matching identifier. Indexes and query APIs aggregate data from these sources into unified results with very high performance.
In architecting a solution, identify existing CDNs that you can leverage, for example around search and CMS.
From a .NET Core perspective, optimally:
- The solution leverages the same entry model classes, whether reading data from the content delivery APIs or from the search index.
- The solution leverages a repository abstraction of a content management implementation behind an abstraction of a search implementation, using search APIs where appropriate and content delivery APIs where necessary.
In addition to performance, consider costs. SaaS vendors vary pricing on criteria such as storage volume, API call counts, bandwidth utilized, and other criteria. If storing and retrieving data from the search index is considerably more expensive than retrieving the corresponding data from the CMS, then consider architectures that favor retrieving data from the CMS. Conversely, if costs are equivalent, consider always accessing the search index directly and minimize solutions specific to the CMS.
Given the various advantages of search and cost unknowns, it may be advisable to architect the solution around search APIs and then consider content delivery APIs as an alternative.
Consider the impact of publishing on any cached data. In addition to indexing or removing content, content management publishing webhooks should trigger eviction or clearing of data cached from search and any application servers or other stores.
Note the lack of transactional nature in any of these architectures. Consider the consequences of failures such as publishing succeeding but indexing failing. Implement fallbacks such as the ability to clear caches and rebuild indexes on command. Message and event queues best support transactions, and along with synchronization APIs, best support sequential and chronological processing.
Note the number of failure points in any architecture. Consider fallbacks, for example timeouts and reverting to content management in the case that search fails.
Based on some conversations and some .NET prototyping, I am just making this up as I go with no real-world experience. I would love to read anyone’s perspectives. Please comment on this blog post.
5 thoughts on “Considerations for Headless CMS Architectures”