Skip to content

Creating a sitemap with MediaWiki… and how to submit it to Google

Creating the map

MediaWiki does have a function to generate its own sitemap. The script is located in the maintenance folder and is called generateSitemap.php. It have to be run from the console (thus you need the php-cli package). In order to have a regularly updated map, we chose to configure its generation to run as a daily Cron job. The command is quite straightforward:

php /home/username/www/maintenance/generateSitemap.php --fspath="/home/username/www" --server="" --compress=no

NB: as for some or most of the maintenance scripts, this script requires AdminSettings.php (in the wiki’s root folder) to be filled properly. This file basically contains the login and password for a database user, and can be created based on AdminSettings.sample.

  • /home/username/www/maintenance/generateSitemap.php is obviously the absolute path to generateSitemap.php
  • –fspath=”/home/username/www” is the path to the folder where you want to save the sitemaps. The script will generate quite a few files, at it will create one sitemap per namespace. Yet, if you want to submit it to Google, you must not place the sitemaps in a subfolder of the wiki (see farther)
  • –server=”” is optional (use it if auto detection fails… or systematically if you prefer)
  • –compress=no disable sitemap compression. I don’t know if Google is able to read compressed sitemaps, so I disabled it (default is yes)

Submitting the map

To submit the sitemap to Google, go verify your site to Google Webmaster Tools (if you have soft 404 errors you’ll need to choose the meta tag verification, to add a meta tag just edit your default skin – eg skins/MonoBook.php -, search for “<head>” and add the meta tag somewhere after it) then submit the sitemap index, which is named something like sitemap-index-[database name]-[table prefix].xml. And wait for it to be crawled (should take a few minutes).

Fixing the errors

URL not allowed

URL not allowed
This url is not allowed for a Sitemap at this location.

This means that you placed your sitemap either in a level inferior to the pages you are listing (ie in a subfolder), either on another domain (if you’re listing, you can’t place the sitemaps in nor in So just move the sitemap to an appropriate place (very same domain and higher or same level as all the URLs listed).

Invalid URL

Invalid URL
We’ve detected that a Sitemap you’ve listed doesn’t include the full URL.

This is just a warning that shouldn’t prevent indexing. Yet for a nicer icon (valid instead of warning ;)) you may want to fix it. This warning is due to the fact that the sitemap index created by the script uses relatives URLs instead of absolute ones. To fix it, you’ll have to edit the script:

  • open generateSitemap.php
  • search for function indexEntry
  • in this function, find "\t\t$filename\n" and add the path need to make the URL complete. For instance if your sitemap index is on, replace this text with "\t\t$filename\n" (note the slash!)

Regenerate your sitemap, then resubmit it to Google. All should be fine now… except if one of your maps has over 50k URLs (Google doesn’t accept this, you need to split them… and I don’t know how to do this :/)

Update (2011-02-12): after upgrading to MediaWiki 1.16, another issue seems to have appeared: the site domain name is replaced by “localhost” for some reason… I guess there is some setting to configure somewhere, but since I didn’t manage to find it here is some kind of fix:

  • open generateSitemap.php
  • search for function fileEntry
  • in this function, replace "\t\t$url\n" . with "\t\t".str_replace('localhost','',$url)."\n" .

Posted in Google, MediaWiki, web development.

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

or, reply to this post via trackback.

Sorry about the CAPTCHA that requires JS. If you really don't want to enable JS and still want to comment, you can send me your comment via e-mail and I'll post it for you.

Please solve the CAPTCHA below in order to fight spamWordPress CAPTCHA