Ayende @ Rahien

Unnatural acts on source code

Challenge: The regex that doesn’t match

Can you make this test pass?

var expected = @"Cached query: 
SELECT this_.Id             as Id5_0_,
       this_.Title          as Title5_0_,
       this_.Subtitle       as Subtitle5_0_,
       this_.AllowsComments as AllowsCo4_5_0_,
       this_.CreatedAt      as CreatedAt5_0_
FROM   Blogs this_
WHERE  this_.Title = 'The lazy blog' /* @p0 */
       and this_.Id = 1 /* @p1 */

";
Assert.True(Regex.IsMatch(expected, expected.Replace("5_", @"\d+_")));

I really don’t know what to think about this anymore….

Comments

Michael Morton
05/11/2009 05:45 PM by
Michael Morton

Regex.IsMatch(expected, Regex.Escape(expected).Replace(@"5", @"\d+"))

The query contains regex characters which need to be escaped (i.e. '.').

It's a bit messy ... if this is going to be used a lot, would probably be good to replace the whitespace with '\s+', etc.

Chris
05/11/2009 05:49 PM by
Chris

Michael: '.' will match any single character, so that will still work; it will just match more than expected.

I might guess that the newlines are confusing the regex engine.

bobo
05/11/2009 06:03 PM by
bobo

it's the /* */ comments...

bobo
05/11/2009 06:08 PM by
bobo

Maybe Regex.Escape before you do your Replace might help...

Joel
05/11/2009 06:17 PM by
Joel

It's a problem with the comment tags in the SQL statement. Remove the comments and the regular expression passes.

Fredy
05/11/2009 06:40 PM by
Fredy

So, is Replace doing something funny to the asterisk or what?...

Bertrand Le Roy
05/11/2009 07:04 PM by
Bertrand Le Roy

Aren't tests supposed to be readable? I have no idea what this is supposed to do from staring at the code. Might be just me though.

James
05/11/2009 07:05 PM by
James

Regex.IsMatch(expected,Regex.Escape(expected).Replace("5", @"\d+"))

passes

Bryan
05/11/2009 07:07 PM by
Bryan

There isn't anything wrong with the string, but that regex will never work, as it's going to change all the digits to 5. Should be this: @"\d+_(?=0)" That does a lookahead and only matches digits followed by an _ then a 0. The 0 is then not part of the grouping and so won't be changed.

It doesn't make sense, why would you want to do that anyway?

Chris Martin
05/11/2009 07:11 PM by
Chris Martin
        const string EXPECTED = @"Cached query: 

SELECT this.Id as Id50_,

   this_.Title          as Title5_0_,

   this_.Subtitle       as Subtitle5_0_,

   this_.AllowsComments as AllowsCo4_5_0_,

   this_.CreatedAt      as CreatedAt5_0_

FROM Blogs this_

WHERE this_.Title = 'The lazy blog' /* @p0 */

   and this_.Id = 1 /* @p1 */";


        string pattern = Regex.Escape(EXPECTED).Replace("5_", @"\d+_");


        Assert.True(Regex.IsMatch(EXPECTED, pattern));
Michael Morton
05/11/2009 07:19 PM by
Michael Morton

@Chris I was only providing one example of what would be escaped. As Sergey and bobo mentioned, the comment tags also cause an issue because '*' is a regex quantifier, and as such, is escaped by Regex.Escape as well.

Ayende Rahien
05/11/2009 08:17 PM by
Ayende Rahien

Bertrand,

Well, when you only see a very small part, it is not surprising

Bertrand Le Roy
05/11/2009 08:22 PM by
Bertrand Le Roy

Cool. That part does look convoluted :)

Rik Hemsley
05/11/2009 09:22 PM by
Rik Hemsley

Why would we care about the regex when the entire principle this is based on is a big hairy WTF?

Cory Foy
05/12/2009 03:56 AM by
Cory Foy

FYI - I know the multiline syntax is handy from a readability perspective, but tests like that may not pass on Mono/Linux or any other OS that doesn't use \r\n as the line terminator:

www.cornetdesign.com/.../...nvironmentnewline.html

To get it to pass, you'd have to recompile on the platform. Well, unless you are developing this on Mono/Linux, but I don't think that's the case. :)

Ayende Rahien
05/12/2009 05:45 AM by
Ayende Rahien

Cory,

Yes, I know about that, but since this is a WPF app, and Mono doesn't have that, I don't worry too much about it

Rob Kitson
05/21/2009 10:40 PM by
Rob Kitson

Don't know if you're still having issues here, but it looks like you need to escape the '' and '.' characters in the pattern. Also, you probably need to compact the whitespace in the pattern to as few as possible, then replace them with \s

Rob Kitson
05/21/2009 10:45 PM by
Rob Kitson

Ooops, didn't see James' comment above... and it looks like it works.

The Regex.Escape() method does pretty much what I described above. From the docs: "Escapes a minimal set of characters (\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes."

That's handy.

Comments have been closed on this topic.