Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,540

filter by tags archive

Thou shall not do threading unless you know what you are doing


I had a really bad couple of days. I am pissed, annoyed and angry, for totally not technical reasons.

And then I run into this issue, and I just want to throw something really hard at someone, repeatedly.

The issue started from this bug report:

   1: NetTopologySuite.Geometries.TopologyException was unhandled
   2:   HResult=-2146232832
   3:   Message= ... trimmed ...
   4:   Source=NetTopologySuite
   5:   StackTrace:
   6:        at NetTopologySuite.Operation.Overlay.Snap.SnapIfNeededOverlayOp.GetResultGeometry(SpatialFunction opCode)
   7:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.UnionActual(IGeometry g0, IGeometry g1)
   8:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.Worker.Execute()
   9:        at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  10:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  11:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
  12:        at System.Threading.ThreadHelper.ThreadStart()

At first, I didn’t really realized why it was my problem. I mean, it is NTS problem, isn’t it?

Except that this particular issue actually crashed ravendb (don’t worry, it is unstable builds only). The reason it crashed RavenDB? An unhandled thread exception.

What I can’t figure out is what on earth is going on. So I took a look at the code, have a look:

image

I checked, and this isn’t code that has been ported from the Java code. You can see the commented code there? That is from the Java version.

And let us look at what the execute method does?

image

So let me see if I understand. We have a list of stuff to do, so we spin out threads, reclusively, then we wait on them. I think that the point was to optimize things in some manner by parallelizing the work between the two halves.

Do you know what the real killer is? If we assume that we have a geometry with just 20 items on it, this will generate twenty two threads.

Leaving aside the issue of not handling errors properly (and killing the entire process because of this), the sheer cost of creating the threads is going to kill this program.

Libraries should be made to be thread safe (I already had to fix a thread safety bug there),  but they should not be creating their own threads unless it is quite clear that they need to do so.

I believe that this is a case of a local optimization for a specific scenario, it also carry all of the issues associated with local optimizations. It solves one problem and opens up seven other ones.


Comments

Diego Guidi

Please report issues to http://code.google.com/p/nettopologysuite/issues/entry. We appreciate you (trying to) using NTS for RavenDB, but I'm afraid no-one has used NTS the way you are doing now, so, as you dig, you'll probably run into more threading issues.

Andrew Rimmer

How cool would it be if Resharper could highlight your code as above along with the WTF stamp.

Igor Kalders

lol@Andrew Rimmer. Now that would be a killer extension :D

Richard Dingwall

Always a favourite - an innocent looking sort method that contains a nasty threading surprise!

anonimous

@oren why not simply report the issue to the people that develop the project? I'm not sure if the persons that follows your blog are capable to find the problem like you did... but I'm sure that, by what I see, they know make fun of a quite serious work that certainly they would not have done better

Bryan

Looks like they need to rethink their logic for how they limit the depth of recursion.

Anon

http://israel.thebeehive.org/en/content/27/136

jonny

Sems to be the wrong tool to use in your case.

Considering they have no idea what threading is all about and from comments above - its better to skip it all together.

Jonny idiot

Jonny U are an idiot - my commet makes as much sense as yours...

Michael J. Ryan

I once worked at a place that prematurely optimized through threading in static classes (experiencing odd behavior and race conditions)... then to find that the "configuration" methods referenced an in-memory table, doing SQL-style queries, as opposed to a key/value store (hash table) which is indexed... the 168 calls that one page used to various setting values caused about 3 seconds in delay time total. Lots of these kinds of things in the wild.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (2):
    21 May 2015 - Adding a new shard to an existing cluster, the easy way
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats