Readable Regexes: The Regular Expressions DSL

time to read 4 min | 611 words

Joshua Flanagan has a post where he shows a really nice solution to the problem of incomprehensible regular expressions here. Great Use Of Fluent Interfaces.

The very simple example is finding a US SSN:

  Regex socialSecurityNumberCheck = new Regex(@"^\d{3}-?\d{2}-?\d{4}$");

Which turns out to be:

    Regex socialSecurityNumberCheck = new Regex(Pattern.With.AtBeginning

        .Digit.Repeat.Exactly(3)

        .Literal("-").Repeat.Optional

        .Digit.Repeat.Exactly(2)

        .Literal("-").Repeat.Optional

        .Digit.Repeat.Exactly(4)

        .AtEnd);

Here is the output from The Regulator about the above regex:

^ (anchor to start of string)
Any digit
Exactly 3 times
-
? (zero or one time)
Any digit
Exactly 2 times
-
? (zero or one time)
Any digit
Exactly 4 times
$ (anchor to end of string)

Comparing the two, they are nearly identical. Which is impressive, since I use Regulator all the time to deal with regexes.

What it more, for about the first time ever, this allow me to consider building a regular expression on the fly. I can't think of a good use case to do this, of course, but now I can.

Very impressive.