Friday, October 9, 2009

Some helpful mod rewrite rules to improve SEO.

If you are new to creating websites, or just want add some new tricks to your existing skill set, getting a grasp on mod rewrite is a pretty good way to improve your website, both in terms of usiblity (for people) and improving search engine rankings (for bots).

Getting started, the .htaccess file


htacess is feature of Apache web server. If set up correctly, it gives the website owner a way to pass configuration changes and tweaks to the Apache web server, affecting how their website behaves.

Apache is typically installed with modules, one of those modules is called mod rewrite, and with it, you can essentially tell Apache what to do, and how to respond to nearly any type of website request.

Check your website for any .htacess files you might have. Keep in mind, Apache directives inside a .htaccess file affect all directories beneath it, so if you have existing .htaccess files, you can usually just add to them.

So either edit or create a .htaccess file with any text editor such as notepad.


Handy .htaccess entries


Customized error pages
Let's face it sometimes things just don't go as planned, but your website can recover from it gracefully with custom error pages.

So, writing a single error page in HTML, including links to the site sitemap.xml, and other sections of the site, a search form if applicable, as well as a javascript redirect to the site's main page. So if a person encounters an error, they will
be steered in the right direction. You can use a single .html file to respond to numerous error types.

Using an error page named 404.html to respond to 404 (not found) and 500 (internal) errors, add this to your .htaccess file

ErrorDocument 500 http://www.superdupersite.com/404.html
ErrorDocument 404 http://www.superdupersite.com/404.html


Blocking website access by IP address.

Deny from 127.0.0.0


Firing up mod rewrite . .


In order to make sense of mod rewrite directives, you need to tell Apache to access the mod_rewrite module.

Adding a line that reads . . .

RewriteEngine on
does exactly that.

Next, need to tell mod rewrite which directory to act upon, if this .htaccess file resides in your main web directory, you probably want . .

RewriteBase /


If the .htaccess file is in some sub directory of your website, you probably want ..

RewriteBase directoryname/


Handy mod rewrite rules


After enabling and setting the base directory, you can start adding mod rewrite rules. One I always include is what I call the the non WWW to WWW rule, basically force the browser or search engine spider to use www in the URLs of the site. If they browse to any pages without WWW, they are 301 redirected.


RewriteCond %{HTTP_HOST} ^superdupersite\.com(.*)
RewriteRule (.*) http://www.superdupersite.com/$1 [R=301,L]


Prettier URLS


If you are using PHP, or really any other web development language, you can take parameters from ugly URLs and make them pretty.

The example below will serve index.php?author=Smith when /author-Smith.html is requested.

RewriteRule ^author-(.*)\.html$ index.php?author=$1 [R=301,L]


Likewise this rule serves index.php?book=Diary when /book-Diary.html is requested.

RewriteRule ^book-(.*)$ index.php?book=$1 [R=301,L]


Passing multiple paremters



Finally you could get book and author in one request like this..

index.php?author=Smith&book=Diary when /author-Smith-book-Diary.html


RewriteRule ^author-(.*)-book-(.*)$ index.php?author=$1&book=$2 [R=301,L]

The $1 represents characters in the first parentheiss (.*) the $2 represents the 2nd one and so on.

Regular expressions


If you have created something like the above mod rewrite rules, congratuations, you have used regular expressions since they were used. ^book=(.*)$

Regular expressions can get tricky but the vasics are ..

^ [caret] Means the beginning of a line, so ^G would match any line with a capital g as their first character.

$ [dollar] Means the end of a line , so G$ would match any line ending with a capitol g.

. [period] Means any single character, including space,

* [asterisk] Mean continually match on previous rule.. more on this later

Ranges . You can specify a range of characters to match on with square bracket []
Match any single digit number ..
[0-9]

Match any single lowercase letter
[a-z]

Match any single uppercase letter
[A-Z]

Match any letter, regardless of case
[A-Za-z]

Match any alphanumeric character
[A-Za-z0-9]

Remember the period and asterisk earlier? Combing examples above, you could do..

Any 2 digit number
[0-9].

Any number, regardless of lenghth
[0-9]*

And same for letters

Any 2 letters
[A-Za-z].

Any letter combination, regardless of lenghth.
[A-Za-z]*

Using just the period and asterisk means ALL characters, so
.* would match an entire line.

A.*Z would match all characters between capital a and z

Remember when a regular expression rule is wrapped with parenthesis, like this .. ^book=(.*)$ it becomes available to the destination URL as $1, $2 etc.

There are far more complex ways to use mod rewrite, as well as .htaccess but hopefully this has gotten you off to a good start.