Ayende @ Rahien

Refunds available at head office

Rhino ETL: First Code Drop

First, let me make it clear, it is not ready yet.

What we have:

  • 99% complete on the syntax
  • Overall architecture should be stable
  • The engine works - but I think of it as a spike, it is likely to change significantly.

What remains to be done:

  • Parallelising the work inside a pipeline
  • Better error messages
  • More logging
  • More tests
  • Transforms over sets of rows

Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).

A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.

Places to start looking at:

  • EtlContextBuilder - Compile the DSL and spits out an instance of:
  • EtlConfigurationContext - the result of the DSL, which can be run using:
  • ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.

There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.

I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.

You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL

But it needs references from the root, so it would be easiest to just do:

svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

Comments

Tobin Harris
07/24/2007 04:13 PM by
Tobin Harris

Cool beans! I'm gonna downlaod and have a play once I'm back on my ETL project. I look foward to it :-)

Dave Arkley
07/24/2007 08:04 PM by
Dave Arkley

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

URL does not exist

When eill it be up?

Regards

Dave

Al Gonzalez
07/24/2007 08:07 PM by
Al Gonzalez

The link doesn't work for me but the following one does:

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/rhino-etl/

Ayende Rahien
07/24/2007 10:43 PM by
Ayende Rahien

Dave,

I have updated the link, please try again.

Thibaut Barrère
07/25/2007 09:38 AM by
Thibaut Barrère

Great to hear - I've been using SSIS a lot and have been looking for open-source alternatives, more easily configurable, customizable etc. I'll definitely have a look. A dotnet open-source alternative is definitely interesting.

BTW, have you heard of ActiveWarehouse-ETL ? (http://activewarehouse.rubyforge.org/etl/). It's another open-source ETL package (in Ruby) which I'm using in production.

regards,

Thibaut

Ayende Rahien
07/25/2007 09:45 AM by
Ayende Rahien

I have heard of it, looks interesting, but it follows a fairly different path than what I have in mind.

Thibaut Barrère
07/25/2007 09:46 AM by
Thibaut Barrère

Ok - just read the older posts, so please ignore my AW-ETL comment :)

Dave Arkley
07/26/2007 10:57 PM by
Dave Arkley

Confirm the link is working fine now. Thanks Ayende.

Regards

Dave

Tobin Harris
08/01/2007 10:05 AM by
Tobin Harris

Hi Ayende

Am I being a muppet? I couldn't get it to build!

I get missing PipelineStage and BlockExpression files. Also, the AssemblyInfo.cs files were missing.

Looking forward to trying this out!

Ayende Rahien
08/01/2007 11:49 AM by
Ayende Rahien

No muppet, the code is broken right now, and need to be fixed (working on it).

the assemblyinfo.cs files are generated when building from the command line

Tobin Harris
08/01/2007 12:26 PM by
Tobin Harris

Cool.

I have just shed a small tear as I had to close the Rhino.ETL solution and instead start a new SSIS project.

I've got 30 data files to import into a table, all are different formats and need various transformations. Do let me know if you get anything sorted today, otherwise I'll look forward to checking out the fixes another time :-)

Comments have been closed on this topic.