Skip to content

Google sitemap software

In the last two weeks, Google has just gone through a very elaborate update which has seen some of my clients’ sites drop dramatically and others rise to the top of some very difficult categories.

This turmoil has focused my attention very clearly on Google and things which might help with Google ranking and indexing. One of those is Google Sitemaps.

For the past several months, Google have offered webmasters and siteowners a sitemap system where you tell them what web pages to scan yourself. You do this by generating an XML sitemap and uploading it to your website in the root directory (ie. http://yourdomain.com/sitemap.xml.gz).

Many webmasters have found significant improvement in the indexing of their site by Google once they submitted their sitemap.

Google offers their own tool, a script in Python which will scan your directories and create a sitemap. There are several problems with Google’s tool:

  1. It is difficult to install.
  2. It is difficult to configure (command-line).
  3. It scans your website locally.

There are several problems with local scans. First, there may be many files on your server that you do not want to share with the world (that are otherwise not linked on the web). Second, it doesn’t work for dynamic sites (which do not have static .html and .htm files).

So the logical thing to do is to look for an alternative. There are three choices:

  1. ready-made online services
  2. programs to install on your own computer
  3. programs to install on your web server

There are any number of online services, most of which are scams to get innocent webmasters to create links back to their own domain (it works by forcing you to have a graphic and a link from your homepage – or in the worst case – every page of your site). None of them offer enough configuration parameters to make them a useful tool.

There are lots of pay tools out there, some of which are better than others. But Google sitemap software will probably be like link checking – the best software is simple and free. Fortunately there are already two good free options out there, very different one from the other.

One is a PHP tool called phpSitemapNG. It is much better than the name sounds. It can be installed and configured in about five minutes (very good for a PHP script) and the installation instructions are very detailed and useful. phpSitemapNG offers a simple but useful configuration interface. A remote script has two benefits. First, it doesn’t tie up one’s own computer or internet connection to run the progam. Second, a remote program can be scheduled to run automatically via a cron job (beyond the scope of this article).

There are unfortunately a number of problems with phpSitemapNG which may or may not affect one’s use of it. The URL filter does not seem to recognise wildcards. The edit interface has trouble with websites of 1000 pages and more. These problems apply to my own use unfortunately where I need both wildcards and have more than a 1000 pages to manage.

The second tool is called GSiteCrawler- written by a certain Johannes Mueller who notes straightforwardly:

When I found out about the Google Sitemaps (www.google.com/webmasters/sitemaps, in beta), I needed a generator for my Websites. Seeing that there were no Windows-Based generators available (at the time ;)), I created my own.

And he did a very good job. The tabbed interface of GSiteCrawler is extremly elegant and the program runs very quickly. Complicated configuration options are easy to manage. Groups of files can be changed in batch mode.

At the end of its run, GSiteCrawler will even upload the finished sitemap automatically.

There are still two reasons to prefer phpSitemapNG.

  1. It can run automatically and subsequently ignored once configured.
  2. You don’t have a Windows computer.

For somebody with larger sites not easily handled by phpSitemapNG and who uses Windows anyway, I would recommend giving GSiteCrawler a try. GSiteCrawler is also easier for non-technical people to manage. For smaller sites which are updated frequently I would recommend taking the trouble to setup phpSitemapNG. phpSitemampNG should improve rapidly as the source code is available as GPL and it is constantly updated.

Now is the time to put up a sitemap. I had not bothered with Google Sitemaps right away as there were no evidence if they helped or not. Now that evidence exists. Moreover, the early Google sitemap software was not very good. Now there are two very good and free tools in GSiteCrawler and phpSitemapNG.

6 Comments

  1. Hi!

    I’m the author of phpSitemapNG, got your website because of the referers of your visitors.

    Just a note to the wildcard problem: you can just put the filename and/or directory that should be excluded in the list – without any wildcard.
    E.g. if you would like to exclude all urls in the directory link to domain.com, you would add admin/ to the “exclude directories” list. Similar to the filename and the keys of an url.

    If you have more questions regarding this (or phpSitemapNG in common), just ask me and I’ll try my best to answer your questions.

    Best regards,
    Tobias

  2. Hey Tobias,

    Thanks for stopping by. Good to know for the exclude directories issue.

    Drop by again when the performance issues are also under control. Then you will have a real winner.

    Best of luck with phpSitemapNG – a wonderful contribution to the web.

    Alec

  3. Steve Steve

    Readers may like to note that many hosts have banned phpSitemapNG from their servers. Mine says ‘for reasons related to server stability and security’.

  4. Hey Steve,

    Thanks for the info.

    With a different PHP script (one for tracking spiders), I got into a lot of trouble with my own webhosts. The scripts were crippling the server (we have SSH access so I could see what was going on) – I didn’t know they were ours.

    One should be very, very careful what one uploads onto one’s site. In most cases, one is travelling in shared hosting in the end. As when one is in public transport, one should take care to bathe and dress in clean clothes for the comfort of the other passengers.

    Cheers, Alec

  5. utkarsh utkarsh

    hi tobias,

    im using ur sitemap gen, it works like a charm!!! i just dont know how to set it up as a cron job.

    can u please tell me how to do that???
    kindly mail me at

    utkarsh2012 [at] gmail [dot] com

    thanks!!

  6. Hi,

    I started using PHPsitemapNG, and it’s taking a while to crawl my site but it’s working just fine, and it’s very practical for the ease of installation & configuration. but I would like to be able to run it as a cron job, I didn’t go through the code to see if that’s feasible, so I hope to give me some pin points on that.

    PHPsitemapNG is a winner,

    best regards

Leave a Reply

Your email address will not be published.


Fatal error: Uncaught Error: Class "ALInfo" not found in /home/uncoyorg/public_html/site/wp-content/plugins/airlift/buffer/cache.php:254 Stack trace: #0 [internal function]: ALCache->optimizePageAndSaveCache('<!DOCTYPE html>...', 9) #1 /home/uncoyorg/public_html/site/wp-includes/functions.php(5221): ob_end_flush() #2 /home/uncoyorg/public_html/site/wp-includes/class-wp-hook.php(307): wp_ob_end_flush_all('') #3 /home/uncoyorg/public_html/site/wp-includes/class-wp-hook.php(331): WP_Hook->apply_filters('', Array) #4 /home/uncoyorg/public_html/site/wp-includes/plugin.php(474): WP_Hook->do_action(Array) #5 /home/uncoyorg/public_html/site/wp-includes/load.php(1100): do_action('shutdown') #6 [internal function]: shutdown_action_hook() #7 {main} thrown in /home/uncoyorg/public_html/site/wp-content/plugins/airlift/buffer/cache.php on line 254