Validating XML sitemaps in node.js

How to use XML Schema (XSD) to validate your sitemap.xml in node.js

March 12, 2017 - 3 minute read -
web seo xml node javascript

Introduction

Sitemaps are a way to tell search engines which pages on your site should be crawled and how often. They are written in XML and if it’s not well-formed and valid, search engines won’t not be able to crawl your content. In this post I will demonstrate how to use XML schemas to validate sitemap XML files. For more information about sitemaps and sitemap index files see the sitemap documentation.

Walkthrough

We need to download the XML schema (XSD) to validate against. It’s available here, we’ll save it in it’s own folder as sitemap.xsd:

mkdir -p test/schemas
curl https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd > \
  test/schemas/sitemap.xsd

We’ll use libxmljs to perform the validation:

npm install libxmljs --save

The validation code will live in a file called sitemap.tests.js and we’ll assume our built sitemap is in the base of the project. Here’s the folder structure:

.
├── package.json
├── sitemap.xml
└── test
    ├── schemas
    │   └── sitemap.xsd
    └── sitemap.tests.js

In this example I’m reading the sitemap and schema from the local file system, however the same approach could easily be used to validate a sitemap that’s generated dynamically. Just call the endpoint and use libxmljs in the same way as shown here, to parse and validate it. Then you can assert on and report the results however you like.

Here’s sitemaps.test.js: