Ayende @ Rahien

It's a girl

Memorable code

public class Program
{
    static List<Thread> list = new List<Thread>();
    private static void Main(string[] args)
    {
        var lines = File.ReadAllLines(args[0]);

        foreach (var line in lines)
        {
            var t = new Thread(Upsert)
            {
                Priority = ThreadPriority.Highest,
                IsBackground = true
            };
            list.Add(t);
            t.Start(line);
        }

        foreach (var thread in list)
        {
            thread.Join();
        }

    }

    private static void Upsert(object o)
    {
        var args = o.ToString().Split(',');
        try
        {
            using(var con = new SqlConnection(Environment.CommandLine.Split(' ')[1]))
            {
                var cmd = new SqlCommand
                {
                    Connection = con, 
                    CommandText = "INSERT INTO Accounts VALUES(@p1, @p2, @p3, @p4,@p5)"
                };

                for (var index = 0; index < args.Length; index++)
                {
                    cmd.Parameters.AddWithValue(@"@p" + (index + 1), args[index]);
                }

                try
                {
                    cmd.ExecuteNonQuery();
                }
                catch (SqlException e)
                {
                    if(e.Number == 2627 )
                    {
                        cmd.CommandText = "UPDATE Accounts SET Name = @p2, Email = @p3, Active = @p4, Birthday = @p5 WHERE ID = @p1";
                        cmd.ExecuteNonQuery();
                    }
                }
            }
        }
        catch (SqlException e)
        {
            if(e.Number == 1205)
            {
                var t = new Thread(Upsert)
                {
                    Priority = ThreadPriority.Highest,
                    IsBackground = true
                };
                list.Add(t);
                t.Start(o);
            }
        }
    }
}

Comments

Duarte Nunes
10/27/2011 10:26 AM by
Duarte Nunes

Nice. The possibility of an InvalidOperationException being thrown while iterating over list is especially sublime.

wqw
10/27/2011 10:31 AM by
wqw

Handling 1205 is crucial as the db gets hit with all the rows in a "batch" :-))

Patrick Huizinga
10/27/2011 10:34 AM by
Patrick Huizinga

I guess they never had an exception number 1205? Or is the foreach not reached before all worker threads are done because of their priority?

I like how they reinvented while(true) by starting a new thread that does the retry on failure.

Also it's nice how the Join on background threads. If they made none of the threads a background thread, the whole Join business would be unnecessary.

Due to my limited experience, I will assume the code shows the ideal way to access a database. ;-)

Scooletz
10/27/2011 10:41 AM by
Scooletz

So here we have the 'gem' we were talking about;-) I'm sorry for anyone eyes that looked at it.

Paulo Abom Pinto
10/27/2011 11:06 AM by
Paulo Abom Pinto

Well ... exception over exception .. this is not good.

According with the flow when we try to do that INSERT and the fail is Unique Index/Constriant Violation (http://weblogs.asp.net/guys/archive/2005/05/20/408142.aspx) the system try to perform an update.

This is stupid .. should try to do a SELECT first to check the unique key and then do the INSERT or UPDATE.... If there is any other error, nothing happen, the system continues his live.

If the update get an error and is Deadlock (never got this) the system try to do retry .. the system should put the operation in a queue or somethinf like that, and try later .. not like this.

this code only process a few exceptions. What about others like wrong number of parameters or something like that?

Regards Paulo Aboim Pinto Odivelas - Portugal

Markus Zywitza
10/27/2011 11:44 AM by
Markus Zywitza

Posting this without any comment may people lead into thinking that this is code recommended by you...

Frank Quednau
10/27/2011 11:48 AM by
Frank Quednau

Woa...make sure you wear a radiation suit when this piece of code crosses your path. I am certain I see at most half of the issues that could arise in this lethal combination of File IO, threading and DB ops. This code scares me and if I'd have the option I would run away.

Igor Kalders
10/27/2011 11:54 AM by
Igor Kalders

You got to hand it to the author: he managed to approach CRUD without even reading anything that the DB might return :D (nitpick: yeah, SqlException does read results)

TomCollins
10/27/2011 12:04 PM by
TomCollins

Not too bad ;-) Two main things: - There is no error handling for those lines in the file that produce an error other than 2627 or 1205 (there might be a not null constraint on some column) - Assuming the db is kind of standard: It's not the optimal way to parallelize inserts on the client side while the database can do physical writes only one at a time. Better is first starting a transaction then preparing the statement and then executing it for every line. Especially NOT performing a prepare every time (just executing) helps a LOT. (Up to 100 times faster!).

JimT
10/27/2011 12:29 PM by
JimT

I predict this becoming an interview question for a dozen or more companies now :)

Justin
10/27/2011 01:00 PM by
Justin

There are very few legitimate reasons to use ThreadPriority.AboveNormal, there are essentially no legitimate reasons to use ThreadPriority.Highest. I can't even look at the rest of the code now.

Wesley
10/27/2011 01:01 PM by
Wesley

Just in general a no go. Load the data from the text file in bulk in a database and do the upsert via stored procedures. If you want to achieve more performance and less transaction log file growth. You can use partitioned tables and do partition swapping.

Halo
10/27/2011 01:23 PM by
Halo

What version of MSSQL is this? If it is 2008 I would recommend looking into the MERGE DML statement along with table-valued parameters, which would allow you to pass the data as a single structured table and perform the INSERT/UPDATE in a single atomic command. Otherwise, for performance purposes, I would probably recommend using SqlBulkCopy to push the data into a temporary table and then using bulk INSERT/UPDATE commands from SQL. That can be done in batches of any size if you want progress during the file load or to lessen the possibility of lock contention.

evereq
10/27/2011 01:59 PM by
evereq

Classical example when programmer use wrong approach to increase data access performance with Multithreading: b) Usually INSERT / UPDATE take less time than to create new Thread. Even if you use ThreadPool, time to switch context also usually greater than time to do simple SQL INSERT / UPDATE command. In given example, Threads used without any ThreadPool, and so every time new thread will be created (which is very SLOW), instead of reuse of existed Thread etc. So acceptable approach might be usage of some ThreadPool (or TPL) and do partitioning of initial data so that each thread have some real work to do, not just single SQL statement :D c) Usually different DB engines have different special API to load bulk data into some table - so why not to use that and do all required logic (if you should) after data already in DB. d) No connection string shown here, but it is very important to configure DB connection pool and set it to right size to not overload your DB (especially because all inserts go to the same table :D) e) code is NOT safe, not only because some exceptions are not handled (per comments above), but also because some operations access shared resources without concurrency issues in mind: in Upsert(object o) method, inside catch block, list.Add(t); operation do exactly that for example :) etc. I.e. basically I see many such bugs only in few lines of code, which is bad sign :( For programmer to be safe it was probably better to stay in single thread :D

P.S. some issues might be more easy to prevent with asynchronous / non-blocking programming in single thread, i.e. ala NodeJS :D Just run as many async, non-blocking inserts as you want and leave the rest to framework :D

Danyal
10/27/2011 02:29 PM by
Danyal

It's the code fragment that keeps on giving.

EDev
10/27/2011 02:45 PM by
EDev

@Paulo Abom Pinto - it's may really not be as stupid as you think. If the nature of data is such that it would be insert operations a majority of the time, the above (just referring to insert-catch exception-update part of it) is actually quite an efficient strategy. Using a select to check before every single insert results in extra work for database server, extra network traffic, slower performance, deadlocks and non-atomic operation (what if you use a select to check and then try to insert but someone already inserted data in the meantime).

TomCollins
10/27/2011 03:06 PM by
TomCollins

@EDev Agree with you. Annotation: If you use "SELECT .. WITH XLOCK.." (or "SELECT ... FOR UPDATE " oracle) you can be sure that no one intersects.

Rabo
10/27/2011 03:32 PM by
Rabo

Why did you do this to me? This is the worst thing ever!

Dan Bailiff
10/27/2011 03:49 PM by
Dan Bailiff

It's easy to say "this is terrible". If you aren't going to point out why, even if you think it should be obvious, then you're not helping. You have a great opportunity to show people the better way. Why not do that instead?

Some people don't know any better because they've never been shown the "right" way to do things.

Dennis
10/27/2011 03:51 PM by
Dennis

missing con.Open()?

Jimmy Bogard
10/27/2011 03:59 PM by
Jimmy Bogard

http://www.youtube.com/watch?v=juFZh92MUOY

Dmitry
10/27/2011 08:33 PM by
Dmitry

Threads in a loop is often an indicator of a scary code.

Bob
10/27/2011 09:41 PM by
Bob

Typical post, leave some code, don't show what is wrong and how to fix it, thanks for helping NOT!

Bob
10/27/2011 09:43 PM by
Bob

And yes before one of your followers tells me it is obvious what is wrong here, I can clearly see, but not EVERYONE is at a level that can see this.

These are the people who would benefit from being shown some crap code and how it should have been done. OTherwise these posts are just a stroke of your ego and do not achieve anything.

Jason Pettys
10/27/2011 10:09 PM by
Jason Pettys

Bob: come on, man! He never promised to be didactic in every post! Ayende: thanks - that was awesome.

Harry Steinhilber
10/27/2011 10:23 PM by
Harry Steinhilber

@Dan Bailiff & Bob, Keep in mind, this is Oren's blog. He can help or not as he sees fit. Personally, just sifting through the comments has helped me pick out several issues I missed on my first read. That lead me to thinking about how I would re-architect to fix those problems in addition to the ones I spotted instantly. That alone is useful.

Nadav Sofy
10/27/2011 11:20 PM by
Nadav Sofy

Bob - if he had tried to explain everything that is wrong with this code it would have been a very long post.

and may be i'm way off, but i think that even the newest programmer should know the basics of exception handling, magic numbers transactions, threading and the rest. those who don't notice the problems with this code should read some more basic programming articles.

oh my
10/28/2011 03:56 AM by
oh my

No more posts in queue!

Chris Tavares
10/28/2011 04:15 AM by
Chris Tavares

Heh... all that effort to get parallelism and he shot it in the foot. Using ThreadPriority.Highest means that the worker thread will take higher priority than any other thread in the process... including the process that's running Main and scheduling the other threads. Once the thread quantum expires, no more iterations on the loop until the scheduled (and probably small number of) worker threads all finish.

Ori
10/28/2011 07:50 AM by
Ori

nice :)

This is obviously production code for bank accounts and may be the reason for the worlds economic woes..

When all you have is a hammer, everything looks like a nail.

Max
10/28/2011 10:09 AM by
Max

Not so memorable ;) For example, if you start the thread in the second loop you can have a really memorable code !!! ;) no limit to the worst!

Bob2
10/28/2011 02:33 PM by
Bob2

@Dan Bailiff and @Bob

Eventually you'll figure out only the k00l kidz hang out here and they all try to out do each other to impress the guy who runs the blog.

You'll need to pass your Ayende blog readers certification test before you'll be accepted as a k00l kid.

Fear not, most of them don't understand either - they're just trying to impress each other.

Yawn, how boring. It's like living high school over and over. Occasionally there is a nice post, but each time you see code dumps, skip the post, the only responses are from kidz trying to impress each other.

Jimmy
10/28/2011 02:52 PM by
Jimmy

A very nice new API - upsert.

Paulo Abom Pinto
10/28/2011 03:04 PM by
Paulo Abom Pinto

@EDev of course the SELECT before it's not he perfect way, but he could ensure that on SELECT the record is locked or try a dabase trasaction and try to ensure the flow.

In my opinion some of this code should be in a Store Procedure and the thread flow in the C#.

Nadav Sofy
10/28/2011 04:12 PM by
Nadav Sofy

to those who suggested select and then insert/update, it's more efficient to do update, check the return value to see if any row was updated, if it's 0 then do an insert.

@Paulo Abom Pinto - in 99% of the times, there's no justification for using a stored procedure to do anything.

Darren Cauthon
10/28/2011 07:09 PM by
Darren Cauthon

I think the biggest problem with this code is that it expresses no intent. I read the code yet I don't have any idea what it does.

Yeah, there might be issues with the implementation, but even without ToString() or the threading or whatever this would still be bad code. Good code can be understood.

TJ
10/28/2011 07:30 PM by
TJ

This code is basically doing nothing but spin a thread for each line within the text file that it is reading. It never open the connection to sql in the first place therefore none of the catches will be executed after it fails on executeNonQuery.

Fishy
10/29/2011 11:54 AM by
Fishy

A piece of code and everyone starts laughing. But what is the context of this code? If it is a test for an interview without any VS at hand and just 30 minutes time for it? In this case it isn't that bad. And it would excuse that there is no con.Open...btw most of the commenters above did not realise this.

Nick
10/29/2011 07:56 PM by
Nick

If the insert throws an exception then it must be an update haha. Love it

firefly
10/30/2011 10:01 AM by
firefly

If somebody can code this in 30 minutes then either he is really good or really bad; generally it is the later because it include premature optimization code that is obscure.

Nadav Sofy
10/30/2011 11:45 PM by
Nadav Sofy

Not opening the connection is a legitimate bug that you find out about after the first run. Not using a transaction is a huge bug that is not as easy to find. And no way someone wrote this code during a 30 minutes interview and knew the codes of specific sql exceptions

Joe Marquardt
10/31/2011 08:48 PM by
Joe Marquardt

At least an invalid server name in the connection string will timeout in parallel! meh.

Dathan
11/01/2011 07:58 AM by
Dathan

@Harry Agreed. And frequently even when Ayende does provide feedback on the code samples he presents, the comments include lots of additional feedback that he doesn't mention. In the case of code whose shortcomings actually require significant expertise to really appreciate, then it makes sense for Ayende to chime in. In the case of a code sample like this that's just egregiously broken, it makes perfect sense to just let the commenters evaluate it. It's like crowdsourcing source code review.

Dathan
11/01/2011 08:16 AM by
Dathan

@Chris Taveres Not necessarily. Any blocking calls should make the thread sleep, at which point the other threads will have an opportunity to run. Which, of course, makes setting ThreadPriority.Highest kind of pointless, but it shouldn't starve out the other worker threads, since most of the time spent here will either be in establishing the connection or executing the command - both of which I would expect to cause the thread to sleep. Unless I'm missing something - I have to admit I'm a bit of a multithreading neophyte.

ThomasX
11/01/2011 09:59 AM by
ThomasX

A "great" example of cloud computing. A cloud of threads...

Clay
11/03/2011 07:25 PM by
Clay

A cloud of sycophants.

Christopher Wright
11/05/2011 01:55 PM by
Christopher Wright

@Clay: you failed to honor Ayende properly in that post. Please enter additional grovel, else you will be reported to the authorities.

Clay
11/06/2011 03:40 AM by
Clay

@Christopher

Thanks for bringing that to my attention. I apologize as I should have been more clear. A slobbering cloud of sycophants.

dead-detector
11/06/2011 10:38 AM by
dead-detector

What happend to Oren? Is hope he is alive.

Ryan
11/07/2011 05:26 AM by
Ryan

@dead-detector He has been teaching classes in NYC. See the top of the sidebar.

duplicated
11/10/2011 07:46 AM by
duplicated

It's your RavenDB database corrupted? Post and comments seems like duplicated (or triplicated... or more) :)

Ayende Rahien
11/10/2011 08:47 AM by
Ayende Rahien

Duplicated, Yes, fix now, a post with an explanation coming up shortly

Tom Janssens
11/10/2011 08:02 PM by
Tom Janssens

I think that for db connections async dev might be beneficial, although I have not tried it. However, I would just opt for a backgroundworker instead. IMO the biggest problem with this code is in the exception handling: it lacks an "else throw;" in case the exception isn't the one expected...

Tom Janssens
11/22/2011 12:18 PM by
Tom Janssens

I was checking whether someone provided an answer on this one yet... Looking back, please forget my previous incredibly stupid comment; I was talking rubbish;allowing an app to set up a multitude of concurrent connections would hog your DB connection pool, leaving no connections for your app, in other words this would be a recipe for disaster. Rather odd that I missed it the first time, as did everyone else... So, next to the missing "else throw" in the first exception handler, just ditch the paralleled approach all together, as it would probably hog up all connections on your database !

Comments have been closed on this topic.