Skip to content

A bit on regular expressions: my .htaccess files

I totally hate regular expressions. I never managed to find a decent manual or reference properly explaining them. By properly I mean a manual that gives me all the details I need (which isn’t quite much indeed) while being short enough to not make me fall asleep in boredom (which for regex should mean something like less than 2 printed pages… or just something with tons of examples). This sounds easy enough. Well, it doesn’t seem to exist though. So I’ll be posting some regexp that I eventually managed to find around, or even to create myself (woo!).
Let’s start with:

I needed this one for various anti-leech protections, including on this very site (notably for the img server), as used in this .htaccess file:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$ [NC]
RewriteCond %{HTTP_REFERER} !^$ [NC]
RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ - [F,NC]

  • RewriteEngine on: loads mod_rewrite (NB: you still need to have mod_rewrite enabled in your Apache configuration, though)
  • RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$ [NC]: RewriteCond is a directive which defines a condition under which rewriting will take place. The syntax is RewriteCond $StringToTest $Regex [$flags].
    %{HTTP_REFERER} refers to the referrer HTTP header (variable names have to be used like: %{variable name}). “!” means (like usual) “not”.
    ^http://patheticcockroach\.com$ is the regexp to which we compare the referrer. “^” means string start, “\” is used to escape the dots (which otherwise are part of regex syntax), “$” means string end.
    [NC] means we don’t care about the case.
    So this regex will match “”, and with the additional mod_rewrite NC flag, this will also match, for instance “”
  • RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$ [NC]: almost the same, except for the “.*”. “.” (dot) means “any character”. “*” (asterisk) means “the preceding element can be there zero to an infinite number of times”.
    So this will match, for instance “”.
  • RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$ [NC]: [A-Za-z0-9] means “one character in the range A to Z, a to z or 0 to 9”. As we saw earlier, “*” means “the preceding element can be there zero to an infinite number of times”.
    So this will match things like “” or “”. Note that specifying A to Z AND a to z is redundant, since we specify at the end “[NC]”.
  • RewriteCond %{HTTP_REFERER} !^$ [NC]: ^$ is the empty string, so if the user uses a very old browser which doesn’t send referrers (or most likely a browser configured to not send referrers), we still accept it.
  • RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ – [F,NC]: RewriteRule is a directive which defines rules for the rewriting engine. The syntax is RewriteRule $Regex $Substitution [$flags].
    \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ matches “.jpg”, “.jpeg”, “.gif”, “.png”, “.bmp”, “.7z”, “.exe”, “.xpi”. We already saw the meanings of “\” (backslash), “.” and “$”. The parentheses are used the same way as most often, to group/sort operators. “|” (vertical bar) means “or”. So jpg|png means jpg or png. And \.(jpg|png) means .jpg or .png.
    “-” (dash) means we do no substitution. We already saw the NC flag. The F flag means the requested URL will be forbidden (if the RewriteConds are all true, eg if the URL doesn’t match our regex.

The funny thing is that, as I wrote this post, I got a bit more friendly to regexp and mod_rewrite… well, that was part of the objective I guess. The references I used today:

Posted in programming, regular expressions, web development.

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

or, reply to this post via trackback.

Sorry about the CAPTCHA that requires JS. If you really don't want to enable JS and still want to comment, you can send me your comment via e-mail and I'll post it for you.

Please solve the CAPTCHA below in order to fight spamWordPress CAPTCHA