This blog posts presents some considerations for implementing language fallback with headless content management systems. Language fallback refers to using field values from an alternate language version of an entry when a specific language version does not contain values for those fields or using an alternate language version of an entry when there is no version of the entry for a specific language. Each CMS vendor implements language fallback differently, and my experience is that any approach to language fallback cannot meet all relevant customer expectations, specifically by not supporting the level of control required for certain use cases. Consult with the CMS and search vendors before implementing a language fallback solution. I suggest approaches that avoid vendor-specific language fallback features.
One of the first considerations for internationalization is whether to translate the entries or use separate entries for the different languages. If the information architecture (both URL structure and the data types at those URLs) is the same or relatively similar, it may be appropriate to translate entries, which is a far simpler solution. Using separate repositories for the different languages with fallback requires access to multiple repositories, mapping entry identifiers in one repository to entry identifiers in the other, and possibly addressing other potential complications. For this post, I assume a single repository; a single information architecture where the URLs structure is the same for each language, and the entries associated with those URLs use the same structure for each language. Note that translating URL paths require additional consideration. I know that it is inappropriate, but most solutions simply use the URLs appropriate for the default (fallback) language, which, for most of the marketing sites I’ve ever seen, is typically one of the major variants of English.
Note that fallback languages can have fallback languages. For example, Australian English could fall back to British English, which could fall back to American English, just to offend everyone involved. This is a relevant concern even for a single level of fallback; when a user is browsing content in their language, they may not like the experience of clicking a link and finding content in a different language, especially mixed content on a page. If possible, when linking to entries that contain fallback values, the link itself should indicate that the target may contain content in a language other than the visitor’s preference.
Beware that using the same content on multiple pages can impact SEO. Speaking of SEO, consider how URLs will identify languages, but using a subdomain, a token in the URL path, a query string parameter, or otherwise. I do not recommend cookies except to store a visitor’s preference so that the home page can redirect to the appropriate default language page.
For context, I have a prototype ASP.NET Core razor pages solution that can retrieve data from the CMS or from a search index that contains a copy of that data, but of course any solution must support JavaScript. Because I prefer the search approach, I will describe that first, which will clarify some of the challenges that apply without a search engine.
If a field that should support language fallback does not have a value, or if an entry does not have a version for a language, the search indexing logic should be able to retrieve the field values or the entire entry from the CMS in the fallback language(s) and store those values to the index. If both .NET and JavaScript access the search index rather than the CMS, then both get language fallback values by default with no extra logic. I suggest one index per language, which avoids the potential for forgetting to filter results from the search engine by language and allows separate index update/rebuild/replacement for each language. Something must map languages to the corresponding search indexes, typically using a naming convention. The index name contains the language name and potentially an environment identifier if the search host contains indexes for multiple environments.
Without a search index, I recommend placing an API broker between the consuming applications (.NET and JavaScript) and the CMS, where that API broker performs the language fallback operations for clients. Otherwise, the solution seems to require corresponding language fallback logic in both .NET and JavaScript.
I do not know JavaScript, but for .NET specifically, custom attributes on the entry models and their properties can control fallback logic, for example whether a field should fallback or whether the entire entry should fallback.
For field fallback, beware of cases where null is a valid value for a field that should fallback. A completely separate architecture could use CMS UI extensions or webhooks to create or update alternate language versions or set field values from the fallback language. This has the advantage of being extremely explicit and granting the CMS user full control but has some complexities that I would prefer not to consider here. Such techniques may be appropriate for certain use cases, such as to prompt the user if they want to override values in alternate language versions when they update the fallback language version.
For some solutions, it may be appropriate for each language version to contain a checkbox to indicate whether that language version should appear even if all of its field values fallback. Some content is not relevant to all audiences.
There are certainly other possible approaches, such as using save webhooks or polling synchronization APIs to update language variants based on changes to their fallback languages, though any such approach could be tricky for some scenarios. One issue to consider is data duplication and hence synchronization challenges. I would try to avoid data duplication in the primary system, but indexing data creates a duplicate by default, and I do not object to that.
Personally, I would try to push back on a customer that wants language fallback, as they often do not understand the implications, the results can appear unpredictable (especially if different CMS users have different expectations), has negative effects on SEO and visitors, and because I like to keep things as simple as possible.
If you have useful perspectives on this topic, please comment on this blog post.