Digging Into Regular Expressions

Stop me if you’ve heard this sentiment before: “Regex? You mean that archaic text matching stuff that Unix freaks use?”

Regular expressions (often abbreviated as ‘regex’) are surprisingly unpopular in the developer community (at least at the places that I’ve worked). Which is a shame. It’s a powerful tool, and something any professional programmer should have in his toolbox.

You may not need them very often if you just develop CRUD applications in a nice cosy web framework. But some day you may need to crunch some text files, manually export some data, or scrape a web page… and then you’ll have a much easier time of it if you’re able to quickly bang out a regular expression.

I’ve known “just enough to be dangerous” about regular expressions for years, but never got over that initial hump; every time I needed to use them I had to find a cheat sheet on the web and basically rediscover the syntax. This didn’t exactly make for a tool that I eagerly reached for. This also ties into my unfortunate weak knowledge of scripting languages.

So I needed to improve myself in this area. I started by forcing myself to do some nontrivial text crunching in Ruby; a web scraper, some text file processing, stuff like that. Unfortunately, language specific regex documentation usually sucks because it

A) is often very short (see the slim treatment of regular expressions in The Pickaxe)

B) uses lame, simplistic examples

and C), is often incomplete.

Incomplete in the sense that they leave nuances out, and only cover the specific flavor of regex of that specific programming language. If you switch to another language with another subset of regex functionality you have to spend time relearning things and discovering new quirks.

Which is why I went and bought this sucker:

Mastering Regular Expressions is an excruciatingly complete reference on regex. It starts with a basic overview of syntax and “pattern matching thinking”, then goes on to cover the superset of regex syntax across all the major languages and tools. It delves into how regular expressions work under the hood, discusses performance issues, and provides in depth discussion of how some of the major languages support it (Perl, Java, .Net, PHP). All in all, a very complete and well written reference. I highly recommend it.

No, of course I didn’t read the book cover to cover :). But I worked through the core chapters, and the big takeaways for me were:

And finally: now I can use Regex Buddy with a clear conscience, yay! I have nothing against using GUI wrappers to speed up the work process. But I do strongly feel that one should, to a certain degree, understand and be able to use the underlying tools as well. It’s the difference between using a supporting tool as running shoes rather than a pair of crutches; the former makes you go even faster, the latter just saves you from being a cripple.