Regression Testing a Sitemap

I was recently restructuring my website and was worried that my changes might break some of the links. Placing some 301 Redirects is fine as long as you track down all the links that change. And to track down which links changed is super easy: what better option would there have been than to write a regression test? In fact, thats the primary reason we write regression tests for - to make sure things don't break upon change!

Implementing the regression test

The basic idea is simple: Fetch the current sitemap.xml file from my website and extract all links. For each link, check whether there is a 200 Response when following all redirects, as we want to test whether the redirects land on a proper page.

The test is just another PHPUnit test. To be precise, its built on the shoulders of the awesome phpunit-mink extensions, which allow to write unit tests with Mink. The sitemaps are parsed using the vipnytt/sitemapparser library:

namespace FK\Tests\Regression;

use FK\Tests\Browser\AbstractBrowserTestCase;
use vipnytt\SitemapParser;

class DoNotDestroyLinksTest extends AbstractBrowserTestCase
{
    const BASEURL = 'http://www.fabian-keller.de';
    const SITEMAP = self::BASEURL . '/sitemap';
    const PERMITTED_CODES = [200];

    public function testAllPages_ofFormerSitemap_areAvailable()
    {
        // parse sitemap
        $parser = new SitemapParser();
        $parser->parseRecursive(self::SITEMAP);
        $urls = $parser->getURLs();
        $this->assertTrue(count($urls) > 0);

        // check that all urls are still working
        foreach ($urls as $url => $tags) {
            // go to slug
            $slug = str_replace(self::BASEURL, '', $url);
            $this->given_i_am_on($slug);

            // assert
            $code = $this->getSession()->getStatusCode();
            $this->assertContains($code, self::PERMITTED_CODES,
                sprintf("Existing slug '%s' has response code %s, but should have one of: %s", $slug, $code,
                    join(", ", self::PERMITTED_CODES)));
        }
    }
}

As you might have noticed, there are some extensions to the phpunit-mink library, which I have wrapped in AbstractBrowserTestCase, as they are shared with the other browser based tests for my website:

namespace FK\Tests\Browser;

use aik099\PHPUnit\BrowserTestCase;

abstract class AbstractBrowserTestCase extends BrowserTestCase
{
    public static $browsers = [
        [
            'driver' => 'goutte',
            // Defaults for this driver.
            'driverOptions' => [
                'server_parameters' => [],
                'guzzle_parameters' => [],
            ],
        ]
    ];

    protected function toUrl($path) {
        return sprintf("http://%s:%d/%s", WEB_SERVER_HOST, WEB_SERVER_PORT, ltrim($path, './'));
    }
    public function given_i_am_on($path) {
        $this->getSession()->visit($this->toUrl($path));
    }
    // ...
}

The WEB_SERVER_HOST and WEB_SERVER_PORT constants are supplied through environment variables to ensure the testing webserver is configurable depending on the environment where the tests are run.

Hook up the CI Server

I have configured the CI server to run all unit tests before deploying the site to production, so in case I break any links the deployment will not start. While this sounds good in the first place it has a major drawback: in case my website is down, got hacked and changed, or the previous deployment screwed up, I cannot update my website with the existing pipeline anymore. However, PHPUnit got us covered there:

protected function setUp()
{
    if (!!getenv('SKIP_SITEMAP_REGRESSION_TEST')) {
        $this->markTestSkipped('Skipping sitemap regression test.');
    }
}

Just add the respective flag in your CI config and the sitemap regression tests will be skipped.

Conclusion

It is super easy to add a regression test for your sitemap, so you don't break links when updating your website. Now go and setup a regression test for your sitemap as well. And if you don't have a sitemap yet, that is the first thing you are going to setup now!

Subscribe to blog updates

Of course, we handle your email address very carefully and will not give it to third parties. You will not receive spam emails from us. Have a look at previous emails, to see what you subscribe for.

Comments