Search engine friendly URLs using apache mod_rewrite

Print Friendly, PDF & Email

As promised, here is a brief technical overview of how to get those nice search friendly URLs using Apache mod_rewrite and .htaccess files. I have already discussed why human readable URLs are a good idea, but it really should be obvious to anyone who has a basic understanding of the way Google views page URLs when calculating page rank.

What are we trying to achieve?

We are going to take the example of a fictitious website that has a database driven catalogue. We will assume for a moment, that the page that handles the navigation of the catalogue is /catalog.php and that it accepts a category and a page number parameter. So, for example, a typical URL might be:

/catalog.php?cat=34&page=2

This would show page 2 of the results for products in the category with id=34. This is a pretty common situation.

Making sure a plain language category name exists

Before we go any further it is going to be necessary to reference the category by a word, rather than a number. This might take a little corrective action, but without it you aren’t going to be able to use plain english URLs. In your database table for the categories, you need to make sure you have a field set aside for the plain language (URL-safe) category name. For example.

  • category name or title (as it would appear on the page) > Steve’s Savers
  • category URL-safe string > steves-savers

I’m probably teaching you to suck eggs, but you can achieve the conversion of any string to an URL-safe string by using something like:

function cleanUrl($title) // take a title, and turn it into a URL-safe string
{
  $title = strtolower($title);
  $title = preg_replace("/[^a-z0-9\s_+]/", '', $title);
  $url = preg_replace("/[\s_]{1,}/", '-', $title);
  return $url;
}

The above will just take a string, convert it to lower case, and strip out everything except a-z, 0-9, white-space, underscore and plus signs. It then replaces chunks of whitespace and underscores with a hyphen.

On to the mod_rewrite redirection

OK, I’m just going to lay the .htaccess file on you, and then we’ll go through it for those who are interested.

RewriteEngine On
RewriteBase /
#
# External redirect to remove trailing slashes from /whatever/whatever/ etc.
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .* - [S=3]
RewriteRule ^([^./]+)/$ http://www.yourdomain.co.uk/$1 [L]
RewriteRule ^([^./]+)/([^./]+)/$ http://www. yourdomain.co.uk/$1/$2 [L]
RewriteRule ^([^./]+)/([^./]+)/([^./]+)/$ http://www. yourdomain.co.uk/$1/$2/$3 [L]
#
# External redirect to canonical hostname
RewriteCond %{HTTP_HOST} !^www\. yourdomain\.co\.uk$ [NC]
RewriteRule ^(.*)$ http://www. yourdomain.co.uk/$1 [R=301,L]
#
# Internal rewrite /something URLs to index.php script with single query string
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^./]+)?$ /catalog.php?cat=$1 [L]
#
# Internal rewrite /something/something URLs to index.php script with two query strings
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^./]+)/([^./]+)$ /catalog.php?cat=$1&page=$2 [L]

The first block removes the trailing slashes from our URLs. There is a lot written about whether this is a good thing or not, but in this example, we have decided that the canonical names of our virtual directories will be without a trailing slash. We need to standardise one way or the other to avoid duplicate entries in search engines.

The next block simply forces redirection to your website’s official fully qualified domain name. So, if someone tries to open http://yourdomain.co.uk, they will be redirected to http://www.yourdomain.co.uk. Again, this isn’t strictly necessary, but is a good idea, and is thrown in here as a bonus.

Next we handle the rewrite’s for the /catalog.php page itself.

RewriteRule ^([^./]+)?$ /catalog.php?cat=$1 [L]

This is the real meat we are looking for. It will redirect

/steves-stuff

to

/catalog.php?cat=steves-stuff

and likewise,

RewriteRule ^([^./]+)/([^./]+)$ /catalog.php?cat=$1&page=$2 [L]

will redirect

/steves-stuff/1

to

/catalog.php?cat=steves-stuff&page=1

The above redirects will by default be 302 “Found”. This means that they will not cause search engines to index the real physical file catalog.php.¬†You can achieve far more complicated arrangements, but hopefully this has given you something to go on. I’m by no means an Apache guru, so if anyone has a better solution, then I would be grateful to hear from you!

Tags: , ,

One Response to “Search engine friendly URLs using apache mod_rewrite”