Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

@

Posts: 5,949 | Comments: 44,546

filter by tags archive

Rhino ETLFirst Code Drop


First, let me make it clear, it is not ready yet.

What we have:

  • 99% complete on the syntax
  • Overall architecture should be stable
  • The engine works - but I think of it as a spike, it is likely to change significantly.

What remains to be done:

  • Parallelising the work inside a pipeline
  • Better error messages
  • More logging
  • More tests
  • Transforms over sets of rows

Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).

A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.

Places to start looking at:

  • EtlContextBuilder - Compile the DSL and spits out an instance of:
  • EtlConfigurationContext - the result of the DSL, which can be run using:
  • ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.

There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.

I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.

You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL

But it needs references from the root, so it would be easiest to just do:

svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

More posts in "Rhino ETL" series:

  1. (16 Oct 2007) Importing Data into MS CRM
  2. (13 Aug 2007) Writing to files
  3. (05 Aug 2007) Web Services Source
  4. (05 Aug 2007) Transactions
  5. (04 Aug 2007) Targets
  6. (04 Aug 2007) Aggregates
  7. (26 Jul 2007) Thinking about Joins & Merges
  8. (24 Jul 2007) First Code Drop

Comments

Tobin Harris

Cool beans! I'm gonna downlaod and have a play once I'm back on my ETL project. I look foward to it :-)

Dave Arkley

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

URL does not exist

When eill it be up?

Regards

Dave

Al Gonzalez

The link doesn't work for me but the following one does:

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/rhino-etl/

Ayende Rahien

Dave,

I have updated the link, please try again.

Thibaut Barrère

Great to hear - I've been using SSIS a lot and have been looking for open-source alternatives, more easily configurable, customizable etc. I'll definitely have a look. A dotnet open-source alternative is definitely interesting.

BTW, have you heard of ActiveWarehouse-ETL ? (http://activewarehouse.rubyforge.org/etl/). It's another open-source ETL package (in Ruby) which I'm using in production.

regards,

Thibaut

Ayende Rahien

I have heard of it, looks interesting, but it follows a fairly different path than what I have in mind.

Thibaut Barrère

Ok - just read the older posts, so please ignore my AW-ETL comment :)

Dave Arkley

Confirm the link is working fine now. Thanks Ayende.

Regards

Dave

Tobin Harris

Hi Ayende

Am I being a muppet? I couldn't get it to build!

I get missing PipelineStage and BlockExpression files. Also, the AssemblyInfo.cs files were missing.

Looking forward to trying this out!

Ayende Rahien

No muppet, the code is broken right now, and need to be fixed (working on it).

the assemblyinfo.cs files are generated when building from the command line

Tobin Harris

Cool.

I have just shed a small tear as I had to close the Rhino.ETL solution and instead start a new SSIS project.

I've got 30 data files to import into a table, all are different formats and need various transformations. Do let me know if you get anything sorted today, otherwise I'll look forward to checking out the fixes another time :-)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The RavenDB Comic Strip: Part III – High availability & sleeping soundly - 11 hours from now

There are posts all the way to May 28, 2015

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    20 May 2015 - Part II – a team in trouble!
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats