Introduction
Sitemaps are a way to tell search engines which pages on your site should be crawled and how often. They are written in XML and if it’s not well-formed and valid, search engines won’t not be able to crawl your content. In this post I will demonstrate how to use XML schemas to validate sitemap XML files. For more information about sitemaps and sitemap index files see the sitemap documentation.
Walkthrough
We need to download the XML schema (XSD) to validate against. It’s available here, we’ll save it in it’s own folder as sitemap.xsd:
mkdir -p test/schemas
curl https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd > \
test/schemas/sitemap.xsd
We’ll use libxmljs to perform the validation:
npm install libxmljs --save
The validation code will live in a file called sitemap.tests.js
and we’ll assume our built sitemap is in the base of the project. Here’s the folder structure:
. ├── package.json ├── sitemap.xml └── test ├── schemas │ └── sitemap.xsd └── sitemap.tests.js
In this example I’m reading the sitemap and schema from the local file system, however the same approach could easily be used to validate a sitemap that’s generated dynamically. Just call the endpoint and use libxmljs in the same way as shown here, to parse and validate it. Then you can assert on and report the results however you like.
Here’s sitemaps.test.js
: