﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Derick Bailey commented on Regex vs. string.IndexOf</title><description>looks like the IndexOf is faster in this case... just found a huge bug in the code i posted on my blog, and have corrected it...
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment20</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment20</guid><pubDate>Mon, 05 Nov 2007 23:32:23 GMT</pubDate></item><item><title>Derick Bailey commented on Regex vs. string.IndexOf</title><description>I've posted an optimized version of the regex so that it properly uses regex functionality instead of trying to mimic what IndexOf does. see my blog:
  
  
http://www.avocadosoftware.com/csblogs/dredge/archive/2007/11/05/776.aspx
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment19</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment19</guid><pubDate>Mon, 05 Nov 2007 23:26:00 GMT</pubDate></item><item><title>Derick Bailey commented on Regex vs. string.IndexOf</title><description>i think you accounted for the overhead of compiling the regex incorrectly.
  
  
you have "start = DateTime.Now" before the regex compile. A better test would be to compile the regex before setting the start time.
  
  
the reason i suggest this, is to show that you only have to compile the regex once, during the initializing of your application (during a static constructor or something on the object that needs it). after that, the regex will likely be much faster.
  
  
but as many previous comments said - it all depends on your scenario and how large the string is.
  
  
also - be careful about constantly regenerating compiled regex. i have heard that there is a memory leak caused by this (same problem as using codedom to compile code on the fly). not sure about this, though.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment18</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment18</guid><pubDate>Mon, 05 Nov 2007 22:54:16 GMT</pubDate></item><item><title>Ramon commented on Regex vs. string.IndexOf</title><description>  
I want to add this one to be complete. I think that it is much more readable but it is slow :-)
  
  
		start = DateTime.Now;
  
		r = new Regex("([^,]+),", RegexOptions.Compiled);
  
		for (int i = 0; i &lt; count; i++){
  
			MatchCollection matches = r.Matches(testStr);
  
			for(int j=0;j&lt;matches.Count; ++j){
  
				string z = matches[j].Value;
  
			}
  
		}
  
		Console.WriteLine(DateTime.Now - start);
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment17</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment17</guid><pubDate>Mon, 05 Nov 2007 14:02:43 GMT</pubDate></item><item><title>Paul Cowan commented on Regex vs. string.IndexOf</title><description>I ran into a performance problem with Cuyahoga and Regexes.
  
  
You should always use static methods of the Regex like so:
  
  
Regex.IsMatch
  
  
The test and I seem to remember the RegexOptions.Compiled flag is bad for performance.....
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment16</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment16</guid><pubDate>Mon, 05 Nov 2007 12:56:14 GMT</pubDate></item><item><title>Eamon Nerbonne commented on Regex vs. string.IndexOf</title><description>On an Athlon 64 3000+, using the updated code and "Release" mode outside VS.NET:
  
  
IndexOf: around 6 sec. (5.6 to 6.2 spread)
  
Regex:   around 4 sec (3.8 to 4.5 spread)
  
  
same executable run under mono:
  
  
IndexOf: 13.641
  
Regex:   36.297
  
  
Running with a debugger attached can cause odd performance quirks, esp. if the code has a managed/unmanaged boundary (i.e. generally not with such simple code), as can running a debug build.
  
  
I would generally prefer regular expressions, were it not that they're not integrated into C# particularly intuitively.  Using explicit capture groups as so:
  
new Regex("^((?&lt;group&gt;[^,]*),)*([^,]*)$", RegexOptions.Compiled | RegexOptions.CultureInvariant|RegexOptions.ExplicitCapture);
  
is in my opinion the most maintainable usage pattern, which you'ld extract information from using Groups:
  
				foreach(Capture cap in r2.Match(testStr).Groups["group"].Captures) {
  
					string z = cap.Value;
  
				}
  
  
Of course, it's not exactly always obvious how to modify such regular expression, but modifying complex parsers written in hand-coded procedural style isn't ever easier either, irrespective of the language.
  
  
I'd use string.Split if that's suitable, which it is in this example, and otherwise a Regex for maintainability.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment15</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment15</guid><pubDate>Sun, 04 Nov 2007 14:48:41 GMT</pubDate></item><item><title>Marcin Seredynski commented on Regex vs. string.IndexOf</title><description>Ayende: I have no idea what's causing the difference in the results we're getting... We're using pretty much the same architecture, if I'm not mistaken.
  
  
That regex performance observation is pretty interesting, though.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment14</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment14</guid><pubDate>Sat, 03 Nov 2007 23:47:14 GMT</pubDate></item><item><title>Ayende Rahien commented on Regex vs. string.IndexOf</title><description>Niki,
  
The string to search is a SQL string, which can be up to 1Kb in size (very rarely).
  
The string to search is ","
  
In this context, efficient means that I can't see a way for a regex impl to do something faster than simple string search.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment13</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment13</guid><pubDate>Sat, 03 Nov 2007 23:15:41 GMT</pubDate></item><item><title>Evan commented on Regex vs. string.IndexOf</title><description>It'll be nice to put a regex extension on the string class in c# 3.0...
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment12</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment12</guid><pubDate>Sat, 03 Nov 2007 23:10:08 GMT</pubDate></item><item><title>Niki commented on Regex vs. string.IndexOf</title><description>"But Justin said that compiled Regex is more efficient than IndexOf, so I decided to check it."
  
  
I think the problem is that the term "efficient" is ambiguous: In this context, it can either mean:
  
a) raw machine speed, as in "takes 7.538 ms on machine A, with input data B"
  
b) computational complexity, as in "O(n^2)"
  
  
System.Text.RegularExpressions (AFAIK) implements the Boyer-Moore-Algorithm, which is of course better in terms of computational complexity, but can be slower for short search patterns. So if speed is your only concern, you need to know how long your search patterns will probably be to make a decision. Everything else is voodoo.
  
  
@Bob: I'd argue that Regex's are better for more complex parsing jobs  because they're declarative: You don't tell the computer *how* to parse a string, but *what* you're looking for. Of course there's a learning curve, but once you understood how they work they're a blessing (IMO). If I had to choose between a 20-line parsing function written in C#, with nested loops and hundreds of places to hide programming errors, and a 1-line-regex, I'd take the regex (almost) every time. It might take you five seconds longer to understand it for the first time, but it might take hours to find that filthy little bug in the 20-line parsing code...
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment11</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment11</guid><pubDate>Sat, 03 Nov 2007 23:00:20 GMT</pubDate></item><item><title>Ayende Rahien commented on Regex vs. string.IndexOf</title><description>Perntium D Dual Core, 3.00 Ghz, Windows 2003, .Net 2.0.50727, debug build
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment10</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment10</guid><pubDate>Sat, 03 Nov 2007 21:55:27 GMT</pubDate></item><item><title>Bob Grommes commented on Regex vs. string.IndexOf</title><description>@Marcos,  you are correct, using the StringComparison.Ordinal or OrdinalIgnoreCase options for IndexOf(), StartsWith(), etc. can be quite a savings.  There are a LOT of strings in most programs that don't require culturally-sensitive handling.  Often, virtually all of the strings in the critical path will be straight ASCII.  It's a great optimization and much under-used.
  
  
@Ayende, I would maintain that for most real-world applications outside of large scale text searching or transformations, Regex tends to add complexity without giving back all that much.  Regex tends to add a "WTF?" factor into quickly reading and understanding code.  Avoiding that is often worth the modest real-world performance penalty of not using regex, even when it exists.
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment9</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment9</guid><pubDate>Sat, 03 Nov 2007 21:47:28 GMT</pubDate></item><item><title>Marcin Seredynski commented on Regex vs. string.IndexOf</title><description>Interesting... What CPU / architecture / OS / .NET runtime / build config? Mine is: Intel Core 2 CPU / T7400 @ 2.16GHz / Windows Vista / .NET 2.0.50727.312 / default VS 2005 Debug build config.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment8</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment8</guid><pubDate>Sat, 03 Nov 2007 21:45:15 GMT</pubDate></item><item><title>Ayende Rahien commented on Regex vs. string.IndexOf</title><description>On my machine:
  
00:00:03.1250000 - string.IndexOf
  
00:00:03.8750000 - RegEx
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment7</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment7</guid><pubDate>Sat, 03 Nov 2007 21:35:42 GMT</pubDate></item><item><title>Marcin Seredynski commented on Regex vs. string.IndexOf</title><description>Ayende: sorry, I skimmed over your code without actually reading it :). I didn't notice the string joining part, and my "_TSET" pattern was matched only once, so the benchmark wasn't very fair. Updated version of the code:
  
  
		static void Main(string[] args)
  
		{
  
			string testStr = "THIS_IS_JUST_A_TEST_abcdefghijklmnopqrstuvwxyz0123456789!987654321zyxwvutsrqpomnlkjihgfedcba_TSET_A_TSUJ_SI_SIHT";
  
			for (int q = 0; q &lt; 3; q++)
  
				testStr += testStr;
  
  
			int count = 500000;
  
			DateTime start = DateTime.Now;
  
			for (int i = 0; i &lt; count; i++)
  
			{
  
				int last = 0, current = 0;
  
				while ((current = testStr.IndexOf("TSET_", current)) != -1)
  
				{
  
					string x = testStr.Substring(last, current - last);
  
					current = last = current + 1;
  
				}
  
			}
  
			Console.WriteLine(DateTime.Now - start);
  
			Regex r = new Regex("TSET_", RegexOptions.Compiled);
  
			start = DateTime.Now;
  
			for (int i = 0; i &lt; count; i++)
  
			{
  
				int last = 0, current = 0;
  
				Match match = r.Match(testStr, current);
  
				while (match.Success)
  
				{
  
					current = match.Index;
  
					string y = testStr.Substring(last, current - last);
  
					current = last = current + 1;
  
					match = r.Match(testStr, current);
  
				}
  
			}
  
			Console.WriteLine(DateTime.Now - start);
  
		}
  
  
  
---
  
  
00:00:02.9800000 (IndexOf) vs 00:00:02.0430000 (Regex)
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment6</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment6</guid><pubDate>Sat, 03 Nov 2007 21:25:41 GMT</pubDate></item><item><title>Ayende Rahien commented on Regex vs. string.IndexOf</title><description>Chad,
  
Because the code that I really have there is doing string parsing, not splitting, for instance, I need to handle ',' in a different way.
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment5</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment5</guid><pubDate>Sat, 03 Nov 2007 21:12:55 GMT</pubDate></item><item><title>Marcos commented on Regex vs. string.IndexOf</title><description>Using in the IndexOf:
  
  
 StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase
  
  
you can speed up even more the IndexOf operation, that way you avoid all the Culture specific code =)
  
  
Cheers
  
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment4</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment4</guid><pubDate>Sat, 03 Nov 2007 21:05:02 GMT</pubDate></item><item><title>Chad Myers commented on Regex vs. string.IndexOf</title><description>Yeah, Regex for a simple split is overkill. It only starts showing value when things get complex. I'm also thinking it might be better for really-large-strings too. I wonder if IndexOf starts breaking down if the string is large.
  
  
One question: Why didn't you use testStr.Split()?  I added that to your test and found that it was slightly slower (0.301 vs. 0.177) but it's much clearer in the code.
  
  
  
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment3</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment3</guid><pubDate>Sat, 03 Nov 2007 20:59:13 GMT</pubDate></item><item><title>Ayende Rahien commented on Regex vs. string.IndexOf</title><description>Can you share the full source code?
  
  
When I added this:
  
  
StringBuilder sb = new  StringBuilder(testStr);
  
for (int i = 0; i &lt; 50; i++)
  
{
  
    sb.Append(testStr);
  
}
  
testStr = sb.ToString();
  
  
The results where:
  
  
00:00:12.3750000
  
00:01:11.2500000
  
  
After changing the pattern to "from", with the same change as above:
  
  
00:00:11.4843750
  
00:00:15.9687500
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment2</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment2</guid><pubDate>Sat, 03 Nov 2007 20:53:41 GMT</pubDate></item><item><title>Marcin Seredynski commented on Regex vs. string.IndexOf</title><description>Try running it for longer input string and longer pattern to be found. Regular expressions *are* faster. Of course everything depends on the scenario for which you're optimizing.
  
  
Data I used:
  
string testStr = "THIS_IS_JUST_A_TEST_abcdefghijklmnopqrstuvwxyz0123456789!987654321zyxwvutsrqpomnlkjihgfedcba_TSET_A_TSUJ_SI_SIHT";
  
pattern: "TSET_"
  
  
Results:
  
String.IndexOf: 00:00:00.4200000 
  
Compiled Regex: 00:00:00.2970000
</description><link>http://ayende.com/2930/regex-vs-string-indexof#comment1</link><guid>http://ayende.com/2930/regex-vs-string-indexof#comment1</guid><pubDate>Sat, 03 Nov 2007 20:45:02 GMT</pubDate></item></channel></rss>