Rhino ETLFirst Code Drop
First, let me make it clear, it is not ready yet.
What we have:
- 99% complete on the syntax
- Overall architecture should be stable
- The engine works - but I think of it as a spike, it is likely to change significantly.
What remains to be done:
- Parallelising the work inside a pipeline
- Better error messages
- More logging
- More tests
- Transforms over sets of rows
Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).
A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.
Places to start looking at:
- EtlContextBuilder - Compile the DSL and spits out an instance of:
- EtlConfigurationContext - the result of the DSL, which can be run using:
- ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.
There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.
I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.
You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL
But it needs references from the root, so it would be easiest to just do:
svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL
More posts in "Rhino ETL" series:
- (16 Oct 2007) Importing Data into MS CRM
- (13 Aug 2007) Writing to files
- (05 Aug 2007) Web Services Source
- (05 Aug 2007) Transactions
- (04 Aug 2007) Targets
- (04 Aug 2007) Aggregates
- (26 Jul 2007) Thinking about Joins & Merges
- (24 Jul 2007) First Code Drop
Comments
Cool beans! I'm gonna downlaod and have a play once I'm back on my ETL project. I look foward to it :-)
https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL
URL does not exist
When eill it be up?
Regards
Dave
The link doesn't work for me but the following one does:
https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/rhino-etl/
Dave,
I have updated the link, please try again.
Great to hear - I've been using SSIS a lot and have been looking for open-source alternatives, more easily configurable, customizable etc. I'll definitely have a look. A dotnet open-source alternative is definitely interesting.
BTW, have you heard of ActiveWarehouse-ETL ? (http://activewarehouse.rubyforge.org/etl/). It's another open-source ETL package (in Ruby) which I'm using in production.
regards,
Thibaut
I have heard of it, looks interesting, but it follows a fairly different path than what I have in mind.
Ok - just read the older posts, so please ignore my AW-ETL comment :)
Confirm the link is working fine now. Thanks Ayende.
Regards
Dave
Hi Ayende
Am I being a muppet? I couldn't get it to build!
I get missing PipelineStage and BlockExpression files. Also, the AssemblyInfo.cs files were missing.
Looking forward to trying this out!
No muppet, the code is broken right now, and need to be fixed (working on it).
the assemblyinfo.cs files are generated when building from the command line
Cool.
I have just shed a small tear as I had to close the Rhino.ETL solution and instead start a new SSIS project.
I've got 30 data files to import into a table, all are different formats and need various transformations. Do let me know if you get anything sorted today, otherwise I'll look forward to checking out the fixes another time :-)
Comment preview