Migrating from Ghost to Sanity CMS

December 16, 2024 (6d ago)

0 views

How I Migrated Blog Content from Ghost to Sanity: A Step-by-Step Guide

Migrating content between content management systems can be a complex process, especially when dealing with platforms like Ghost and Sanity. In this guide, I document how I successfully migrated content from Ghost to Sanity, handling challenges like rich text formatting, embedded videos, pagination, and maintaining data integrity throughout the process.

Why Migrate from Ghost to Sanity?

Ghost is a powerful platform for publishing blogs, but as my projects evolved, I required a more flexible and headless CMS to integrate seamlessly into modern frameworks like Next.js. Sanity’s flexibility, GROQ-powered queries, and real-time editing capabilities made it an ideal choice for my needs.

Planning the Migration

Before starting, I outlined a clear plan for the migration:

  • Export Data from Ghost: Used Ghost's JSON export tool to export all posts, authors, tags, and metadata.
  • Set Up Sanity: Created a Sanity project and configured schemas for post, author, and category.
  • Script the Migration: Wrote a migration script in Node.js to handle data parsing and transformation.
  • Test and Debug: Validated the migration by cross-referencing the data between Ghost and Sanity.

Key Challenges and Solutions

  1. Parsing Rich Text (Portable Text)

Sanity uses Portable Text for rich text, which required parsing HTML from Ghost into a structured JSON format.

Challenge: Maintaining styles like bold, italic, and links.

Solution: A custom processChildren function in JavaScript parsed HTML nodes into Portable Text format. This function handled nested styles, inline marks, and list items while ensuring no data loss.

const processChildren = async (node) => {
  const children = [];
  const markDefs = [];

  for (const child of node.childNodes) {
    if (child.nodeType === 3) {
      // Text node
      children.push({
        _type: "span",
        _key: generateKey(),
        text: child.textContent.trim(),
        marks: [],
      });
    } else if (child.nodeType === 1) {
      // Handle elements like <b>, <i>, <a>, etc.
      switch (child.nodeName) {
        case "B":
        case "STRONG":
          children.push({
            _type: "span",
            _key: generateKey(),
            text: child.textContent,
            marks: ["strong"],
          });
          break;
        case "A":
          const href = child.getAttribute("href");
          const linkKey = generateKey();
          markDefs.push({ _type: "link", _key: linkKey, href });
          children.push({
            _type: "span",
            _key: generateKey(),
            text: child.textContent,
            marks: [linkKey],
          });
          break;
        // Add cases for other elements like <i>, <u>, etc.
      }
    }
  }

  return { children, markDefs };
};
  1. Handling Embedded Videos

Ghost posts often include embedded YouTube or Vimeo videos using <iframe> tags.

Solution: I extended the processNode function to handle iframe elements and map them to Sanity’s embed type schema.

case "IFRAME": {
  const src = node.getAttribute("src");
  return {
    _type: "embed",
    _key: generateKey(),
    url: src,
    caption: node.getAttribute("title") || "Embedded Video",
  };
}
  1. Resolving Pagination

Some Ghost posts were paginated across multiple URLs.

Solution: I created a recursive function to fetch all pages and concatenate the HTML before parsing.

const fetchPaginatedContent = async (url, accumulatedHTML = "") => {
  const response = await axios.get(url);
  const dom = new JSDOM(response.data);
  const nextLink = dom.window.document.querySelector('link[rel="next"]');

  accumulatedHTML += dom.window.document.body.innerHTML;

  if (nextLink && nextLink.href) {
    return fetchPaginatedContent(nextLink.href, accumulatedHTML);
  }
  return accumulatedHTML;
};
  1. Sanitizing Portable Text

To ensure valid Portable Text objects, I implemented a sanitization function to remove invalid marks and ensure proper structure.

const sanitizePortableText = (blocks) => {
  return blocks.map((block) => {
    if (block.children) {
      block.children = block.children.map((child) => ({
        ...child,
        marks: child.marks.filter((mark) =>
          block.markDefs.some((def) => def._key === mark)
        ),
      }));
    }
    return block;
  });
};

Final Migration Script

After addressing these challenges, I finalized a migratePosts script that:

  • Downloaded images and uploaded them to Sanity.
  • Parsed HTML into Portable Text.
  • Created or updated documents in Sanity using Sanity’s client.
const migratePosts = async (posts) => {
  for (const post of posts) {
    const parsedBody = await parseHTMLToPortableText(post.html, post.slug);
    const sanitizedBody = sanitizePortableText(parsedBody);

    const postData = {
      _type: "post",
      title: post.title,
      slug: { current: post.slug },
      body: sanitizedBody,
      publishedAt: post.published_at,
    };

    await client.createOrReplace(postData);
  }
};

Results

  • Seamless Migration: All Ghost posts, including rich text, images, embedded videos, and metadata, were successfully migrated to Sanity.
  • Improved Flexibility: Sanity’s real-time editing and integration with Next.js significantly improved my content workflow.

Lessons Learned

  • Plan Thoroughly: Anticipate potential challenges like embedded media and pagination before starting.
  • Iterate and Test: Validate each part of the migration pipeline with real-world data.
  • Future-Proofing: Write reusable functions to handle other types of content or CMS platforms.

Migrating from Ghost to Sanity was a rewarding experience, showcasing the power of custom scripting to overcome CMS limitations. If you’re considering a similar migration, I hope this guide helps you streamline the process!