Code Dump: .NET Contentstack Media Importer

This blog post describes a prototype .NET command line solution for importing media into the Contentstack SaaS headless content management system. While this tool functions, it could use testing and refactoring. More importantly, it demonstrates relevant APIs and techniques for working with folder paths in Contentstack. You can use these techniques to create asset folder trees, to get a listing of existing folder paths, to import and update media and metadata, to prune empty folders from folder trees, and otherwise.

Collecting Data to Import

This solution uses a CSV file to store metadata about the media to import. Using a spreadsheet simplifies metadata management before import.

The first row of the spreadsheet contains the following header columns:

  • FilePath: The full file system path to the file to import. Cells in this column contain values such as “C:\temp\images\file.png”.
  • Title: The title of the asset to create for the file.
  • Description: The description of the asset.
  • FolderPath: Contentstack asset folder path. Cells in this column contain values such as “folder/path/to/assets” to represent asset folder paths such as /folder/path/to/assets.
  • Tags: Pipe-separated list of tags to apply to the asset.

Create the spreadsheet with the header row and edit the list of files to import in the first column. You can use shell commands to generate a list. To place such a list in the Windows clipboard:

dir /s/b/a:-d c:\temp\images | clip

For Unix, something like this might work:

find /tmp/images -type f -print

For WSL, this works:

find /tmp/images -type f -print | clip.exe

You can perform a search and replace on this list to default the value of the FolderPath column to match the existing paths. For Unix and WSL, you can use something like the following to get a list of default values for the FolderPath column, where 5 indicates that there are five slashes to remove (/mnt/c/temp/images/) from each path. To simplify cleanup while testing imports, you may want to reduce this number by one to create a root folder (in this case, named images) within which to create nested folders and import media.

find /mnt/c/temp/images -type f -print | cut -d / -f5- | cut -d / -f1- | sed 's|/[^/]*$||'

Populate the spreadsheet with metadata and adjust folder paths. In the end, you should have something like this, with any number of lines:

FilePath,Title,Description,FolderPath,Tags
c:\temp\images\animals\birds\peacock.jpeg,Peacock,Pretty bird,animals/birds,animal|bird|peacock

Importing Assets

Run the program against the spreadsheet. It should create any required folder paths, import the files as assets, and remove any empty folders. If you run it again after changing metadata, it should update the existing assets, but I have not gotten that logic working. Deleting and re-importing assets changes asset identifiers, so it is best to complete media imports before content references assets by ID or by path. Pruning empty folders is just a convenient thing to do with data collected during the import process and may be appropriate if the update process moved assets.

How the Importer Works

In Contentstack, assets reference the folders that contain them, and a process cannot import or move an asset into a folder that does not exist. A folder can reference its parent folder, and cannot exist before its parent folder. A folder that does not reference a parent folder exists at the root of the folder tree.

The import process could create folders as needed while going through the list of assets, but I thought that it might be easier to create the folders in advance and then upload assets afterwards. To import assets into folders, the process needs to map folder paths to folder IDs, such as /folder/path/to/assets mapping to the ID of the assets folder. I wanted to be able to determine the ID of the assets folder at this path with minimal API calls, preferably not by retrieving the child of /folder named path, then its child named to, and finally the ID of its child named assets.

I implemented three very simple maps:

  • Folder IDs to Folder Names: For constructing folder paths when only the parent folder of an asset or folder is known.
  • Folder IDs to Parent IDs: For constructing folder paths.
  • Folder Paths to Folder IDs: Associate folder IDs with folder paths.

The media importer populates these lists as follows:

  • Map all folder paths in the spreadsheet to null in the list of folder paths.
  • Add folders that exist in the CMS to the lists of folder IDs.
  • Update the list of folder paths based on that data from the CMS.
  • Update the lists of folder IDs while creating any required folders.
  • Import media.
  • Remove empty folders.

The mapping of filename extensions to mime types is hard-coded in AssetUploader, as are the credentials for accessing Contentstack, which are also hard-coded in HttpRequestProcessor.cs. I am trying to demonstrate Contentstack coding techniques; existing resources cover .NET configuration and coding techniques.

The code is here.

One thought on “Code Dump: .NET Contentstack Media Importer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: