Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 65

filter by tags archive

Thou shall not do threading unless you know what you are doing

time to read 6 min | 1104 words

I had a really bad couple of days. I am pissed, annoyed and angry, for totally not technical reasons.

And then I run into this issue, and I just want to throw something really hard at someone, repeatedly.

The issue started from this bug report:

   1: NetTopologySuite.Geometries.TopologyException was unhandled
   2:   HResult=-2146232832
   3:   Message= ... trimmed ...
   4:   Source=NetTopologySuite
   5:   StackTrace:
   6:        at NetTopologySuite.Operation.Overlay.Snap.SnapIfNeededOverlayOp.GetResultGeometry(SpatialFunction opCode)
   7:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.UnionActual(IGeometry g0, IGeometry g1)
   8:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.Worker.Execute()
   9:        at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  10:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  11:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
  12:        at System.Threading.ThreadHelper.ThreadStart()

At first, I didn’t really realized why it was my problem. I mean, it is NTS problem, isn’t it?

Except that this particular issue actually crashed ravendb (don’t worry, it is unstable builds only). The reason it crashed RavenDB? An unhandled thread exception.

What I can’t figure out is what on earth is going on. So I took a look at the code, have a look:

image

I checked, and this isn’t code that has been ported from the Java code. You can see the commented code there? That is from the Java version.

And let us look at what the execute method does?

image

So let me see if I understand. We have a list of stuff to do, so we spin out threads, reclusively, then we wait on them. I think that the point was to optimize things in some manner by parallelizing the work between the two halves.

Do you know what the real killer is? If we assume that we have a geometry with just 20 items on it, this will generate twenty two threads.

Leaving aside the issue of not handling errors properly (and killing the entire process because of this), the sheer cost of creating the threads is going to kill this program.

Libraries should be made to be thread safe (I already had to fix a thread safety bug there),  but they should not be creating their own threads unless it is quite clear that they need to do so.

I believe that this is a case of a local optimization for a specific scenario, it also carry all of the issues associated with local optimizations. It solves one problem and opens up seven other ones.


Comments

Diego Guidi

Please report issues to http://code.google.com/p/nettopologysuite/issues/entry. We appreciate you (trying to) using NTS for RavenDB, but I'm afraid no-one has used NTS the way you are doing now, so, as you dig, you'll probably run into more threading issues.

Andrew Rimmer

How cool would it be if Resharper could highlight your code as above along with the WTF stamp.

Igor Kalders

lol@Andrew Rimmer. Now that would be a killer extension :D

Richard Dingwall

Always a favourite - an innocent looking sort method that contains a nasty threading surprise!

anonimous

@oren why not simply report the issue to the people that develop the project? I'm not sure if the persons that follows your blog are capable to find the problem like you did... but I'm sure that, by what I see, they know make fun of a quite serious work that certainly they would not have done better

Bryan

Looks like they need to rethink their logic for how they limit the depth of recursion.

Anon

http://israel.thebeehive.org/en/content/27/136

jonny

Sems to be the wrong tool to use in your case.

Considering they have no idea what threading is all about and from comments above - its better to skip it all together.

Jonny idiot

Jonny U are an idiot - my commet makes as much sense as yours...

Michael J. Ryan

I once worked at a place that prematurely optimized through threading in static classes (experiencing odd behavior and race conditions)... then to find that the "configuration" methods referenced an in-memory table, doing SQL-style queries, as opposed to a key/value store (hash table) which is indexed... the 168 calls that one page used to various setting values caused about 3 seconds in delay time total. Lots of these kinds of things in the wild.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 9 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 2 days from now
  4. The insidious cost of allocations - 3 days from now
  5. Buffer allocation strategies: A possible solution - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    31 Aug 2015 - The case of the memory eater and high load
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats