I totally hate regular expressions. I never managed to find a decent manual or reference properly explaining them. By properly I mean a manual that gives me all the details I need (which isn’t quite much indeed) while being short enough to not make me fall asleep in boredom (which for regex should mean something like less than 2 printed pages… or just something with tons of examples). This sounds easy enough. Well, it doesn’t seem to exist though. So I’ll be posting some regexp that I eventually managed to find around, or even to create myself (woo!).
Let’s start with:
^http://[A-Za-z0-9]*\.patheticcockroach\.com/.*$
I needed this one for various anti-leech protections, including on this very site (notably for the img server), as used in this .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$ [NC]
RewriteCond %{HTTP_REFERER} !^$ [NC]
RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ - [F,NC]
- RewriteEngine on: loads mod_rewrite (NB: you still need to have mod_rewrite enabled in your Apache configuration, though)
- RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$ [NC]: RewriteCond is a directive which defines a condition under which rewriting will take place. The syntax is RewriteCond $StringToTest $Regex [$flags].
%{HTTP_REFERER} refers to the referrer HTTP header (variable names have to be used like: %{variable name}). “!” means (like usual) “not”.
^http://patheticcockroach\.com$ is the regexp to which we compare the referrer. “^” means string start, “\” is used to escape the dots (which otherwise are part of regex syntax), “$” means string end.
[NC] means we don’t care about the case.
So this regex will match “http://patheticcockroach.com”, and with the additional mod_rewrite NC flag, this will also match, for instance “http://PATHETICcockroach.com” - RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$ [NC]: almost the same, except for the “.*”. “.” (dot) means “any character”. “*” (asterisk) means “the preceding element can be there zero to an infinite number of times”.
So this will match, for instance “http://patheticcockroach.com/mpam4/?p=4”. - RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$ [NC]: [A-Za-z0-9] means “one character in the range A to Z, a to z or 0 to 9”. As we saw earlier, “*” means “the preceding element can be there zero to an infinite number of times”.
So this will match things like “http://notepad.patheticcockroach.com” or “http://ngfdg45FD.patheticcockroach.com”. Note that specifying A to Z AND a to z is redundant, since we specify at the end “[NC]”. - RewriteCond %{HTTP_REFERER} !^$ [NC]: ^$ is the empty string, so if the user uses a very old browser which doesn’t send referrers (or most likely a browser configured to not send referrers), we still accept it.
- RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ – [F,NC]: RewriteRule is a directive which defines rules for the rewriting engine. The syntax is RewriteRule $Regex $Substitution [$flags].
\.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ matches “.jpg”, “.jpeg”, “.gif”, “.png”, “.bmp”, “.7z”, “.exe”, “.xpi”. We already saw the meanings of “\” (backslash), “.” and “$”. The parentheses are used the same way as most often, to group/sort operators. “|” (vertical bar) means “or”. So jpg|png means jpg or png. And \.(jpg|png) means .jpg or .png.
“-” (dash) means we do no substitution. We already saw the NC flag. The F flag means the requested URL will be forbidden (if the RewriteConds are all true, eg if the URL doesn’t match our regex.
The funny thing is that, as I wrote this post, I got a bit more friendly to regexp and mod_rewrite… well, that was part of the objective I guess. The references I used today:
- mod_rewrite in the Apache 2.2 documentation (talks about RewriteEngine, RewriteCond, RewriteRule and much more)
- Regular expressions on Wikipedia: not the “decent manual” I was talking about in the intro, but still decent indeed 🙂
- Some page on regex on Micro$oft. I didn’t really use it, but this is what decided me to make this post, and there are some elements I’d like to check later… Okay, I’m just making a bookmark.
- A regex tester (making another bookmark indeed).
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.