<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PCR&#039;s notepad &#187; regular expressions</title>
	<atom:link href="http://notepad.patheticcockroach.com/category/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://notepad.patheticcockroach.com</link>
	<description>The area in patheticcockroach.com where the EEG isn&#039;t isoelectric</description>
	<lastBuildDate>Fri, 30 Jul 2010 11:13:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>A bit on regular expressions: my .htaccess files</title>
		<link>http://notepad.patheticcockroach.com/237/a-bit-on-regular-expressions-my-htaccess-files/</link>
		<comments>http://notepad.patheticcockroach.com/237/a-bit-on-regular-expressions-my-htaccess-files/#comments</comments>
		<pubDate>Sun, 11 Jan 2009 06:00:39 +0000</pubDate>
		<dc:creator>David Dernoncourt</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[web development]]></category>

		<guid isPermaLink="false">http://notepad.patheticcockroach.com/?p=237</guid>
		<description><![CDATA[I totally hate regular expressions. I never managed to find a decent manual or reference properly explaining them. By properly I mean a manual that gives me all the details I need (which isn&#8217;t quite much indeed) while being short enough to not make me fall asleep in boredom (which for regex should mean something [...]]]></description>
			<content:encoded><![CDATA[<p>I totally hate regular expressions. I never managed to find a decent manual or reference properly explaining them. By properly I mean a manual that gives me all the details I need (which isn&#8217;t quite much indeed) while being short enough to not make me fall asleep in boredom (which for regex should mean something like less than 2 printed pages&#8230; or just something with tons of examples). This sounds easy enough. Well, it doesn&#8217;t seem to exist though. So I&#8217;ll be posting some regexp that I eventually managed to find around, or even to create myself (woo!).<br />
Let&#8217;s start with:<br />
<code>^http://[A-Za-z0-9]*\.patheticcockroach\.com/.*$</code></p>
<p>I needed this one for various anti-leech protections, including on this very site (notably for the img server), as used in this .htaccess file:</p>
<p><code>RewriteEngine on<br />
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$      [NC]<br />
RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$      [NC]<br />
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com/.*$      [NC]<br />
RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$      [NC]<br />
RewriteCond %{HTTP_REFERER} !^$      [NC]<br />
RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ - [F,NC]</code></p>
<ul>
<li>RewriteEngine on: loads mod_rewrite (NB: you still need to have mod_rewrite enabled in your Apache configuration, though)</li>
<li>RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com$      [NC]: RewriteCond is a directive which defines a condition under which rewriting will take place. The syntax is RewriteCond $StringToTest $Regex [$flags].<br />
%{HTTP_REFERER} refers to the referrer HTTP header (variable names have to be used like: %{variable name}). &#8220;!&#8221; means (like usual) &#8220;not&#8221;.<br />
^http://patheticcockroach\.com$ is the regexp to which we compare the referrer. &#8220;^&#8221; means string start, &#8220;\&#8221; is used to escape the dots (which otherwise are part of regex syntax), &#8220;$&#8221; means string end.<br />
[NC] means we don&#8217;t care about the case.<br />
So this regex will match &#8220;http://patheticcockroach.com&#8221;, and with the additional mod_rewrite NC flag, this will also match, for instance &#8220;http://PATHETICcockroach.com&#8221;</li>
<li>RewriteCond %{HTTP_REFERER} !^http://patheticcockroach\.com/.*$      [NC]: almost the same, except for the &#8220;.*&#8221;. &#8220;.&#8221; (dot) means &#8220;any character&#8221;. &#8220;*&#8221; (asterisk) means &#8220;the preceding element can be there zero to an infinite number of times&#8221;.<br />
So this will match, for instance &#8220;http://patheticcockroach.com/mpam4/?p=4&#8243;.</li>
<li>RewriteCond %{HTTP_REFERER} !^http://[A-Za-z0-9]*\.patheticcockroach\.com$      [NC]: [A-Za-z0-9] means &#8220;one character in the range A to Z, a to z or 0 to 9&#8243;. As we saw earlier, &#8220;*&#8221; means &#8220;the preceding element can be there zero to an infinite number of times&#8221;.<br />
So this will match things like &#8220;http://notepad.patheticcockroach.com&#8221; or &#8220;http://ngfdg45FD.patheticcockroach.com&#8221;. Note that specifying A to Z AND a to z is redundant, since we specify at the end &#8220;[NC]&#8220;.</li>
<li>RewriteCond %{HTTP_REFERER} !^$      [NC]: ^$ is the empty string, so if the user uses a very old browser which doesn&#8217;t send referrers (or most likely a browser configured to not send referrers), we still accept it.</li>
<li>RewriteRule \.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ &#8211; [F,NC]: RewriteRule is a directive which defines rules for the rewriting engine. The syntax is RewriteRule $Regex $Substitution [$flags].<br />
\.(jpg|jpeg|gif|png|bmp|7z|exe|xpi)$ matches &#8220;.jpg&#8221;, &#8220;.jpeg&#8221;, &#8220;.gif&#8221;, &#8220;.png&#8221;, &#8220;.bmp&#8221;, &#8220;.7z&#8221;, &#8220;.exe&#8221;, &#8220;.xpi&#8221;. We already saw the meanings of &#8220;\&#8221; (backslash), &#8220;.&#8221; and &#8220;$&#8221;. The parentheses are used the same way as most often, to group/sort operators. &#8220;|&#8221; (vertical bar) means &#8220;or&#8221;. So jpg|png means jpg or png. And \.(jpg|png) means .jpg or .png.<br />
&#8220;-&#8221; (dash) means we do no substitution. We already saw the NC flag. The F flag means the requested URL will be forbidden (if the RewriteConds are all true, eg if the URL doesn&#8217;t match our regex.</li>
</ul>
<p>The funny thing is that, as I wrote this post, I got a bit more friendly to regexp and mod_rewrite&#8230; well, that was part of the objective I guess. The references I used today:</p>
<ul>
<li><a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html">mod_rewrite in the Apache 2.2 documentation</a> (talks about RewriteEngine, RewriteCond, RewriteRule and much more)</li>
<li><a href="http://en.wikipedia.org/wiki/Regular_expression">Regular expressions on Wikipedia</a>: not the &#8220;decent manual&#8221; I was talking about in the intro, but still decent indeed <img src='http://notepad.patheticcockroach.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li><a href="http://msdn.microsoft.com/en-us/library/hs600312(VS.80).aspx">Some page on regex on Micro$oft</a>. I didn&#8217;t really use it, but this is what decided me to make this post, and there are some elements I&#8217;d like to check later&#8230; Okay, I&#8217;m just making a bookmark.</li>
<li><a href="http://www.regular-expressions.info/javascriptexample.html">A regex tester</a> (making another bookmark indeed).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://notepad.patheticcockroach.com/237/a-bit-on-regular-expressions-my-htaccess-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
