Ayende @ Rahien

Refunds available at head office

Test driving Rhino.ETL

Here is the first test:

[Test] 
public void EvaluatingScript_WithConnection_WillAddDataSourceToContext() 
{ 
	EtlConfigurationContext configurationContext = EtlContextBuilder.FromFile(@"Connections\connection_only.retl");
	Assert.AreEqual(3, configurationContext.Connections.Count, "should have three connections");
}

There is quite a bit of information just in this test, we introduced the EtlConfigurationContext class, decided that we will create it from a factory, and that we have something that is called a connection. Another decision made was the “retl” extension (Rhino ETL), but that is a side benefit.

The source for this is:

Connection( 
	"Northwind",
	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;",
	ConcurrentConnections: 5
	)
	
Connection( 
	"SouthSand",
	ConnectionType: OracleConnection,
	ConnectionStringName: "SouthSand"
	)

Connection( 
	"StrangeOne",
	ConnectionType: OracleConnection,
	ConnectionStringGenerator: { System.Environment.GetEnvironmentVariable("MyEnvVar") }
	)

You may have wondered about the last one, what does this do? Well, it allows you to do runtime evaluation of something, in this case, it get the value from an env-var, but that has a lot of potential. Here it a test that demonstrate the capabilities:

[Test]
public void DataSources_ConnectionStringGenerator_CanUseEvnrionmentVariables()
{
	Environment.SetEnvironmentVariable("MyEnvVar","MyExpectedValue");

	Assert.AreEqual(
			"MyExpectedValue",
			configurationContext.Connections["StrangeOne"].ConnectionString
	);

	Environment.SetEnvironmentVariable("MyEnvVar", "2");

	Assert.AreEqual(
			"2",
			configurationContext.Connections["StrangeOne"].ConnectionString
	);

}

Comments

Adam Tybor
07/20/2007 01:24 AM by
Adam Tybor

Can't wait... whens it going in the repo?

Tobin Harris
07/20/2007 07:13 AM by
Tobin Harris

Looks sweet, is that so you can redirect sources & targets at run time?

I haven't really dug into it, but I was scanning the ActiveWarehouse Ruby project the other day which has it's own way of dong things. Might be good for inspiration!

http://activewarehouse.rubyforge.org/etl/. 

It uses control files to orchestrate sources, transforms etc much like your own does. See example:

http://viewvc.rubyforge.mmmultiworks.com/cgi/viewvc.cgi/etl/trunk/test/delimited.ctl?root=activewarehouse&view=markup

I'm working on an ETL project right now, which was started by someone else as a bespoke VB.NET solution. I asked him if he's considered using any ETL tools such as SSIS or Kettle etc, but he didn't feel that they gave him enough control. I think a lot of developers feel this way. I only used SSIS briefly, but found that it encouraged far too much mouse clicking, and that annoys me because it's not easily automated. Well, it probably is, but I didn't hang around long enough to find out how!

Anyway, I won't rant on. I'I look forward to giving the Ayende flavour of ETL a spin!

Tobin Harris
07/20/2007 08:10 AM by
Tobin Harris

Oh, forgot to ask. Will you be including any "common" data cleaning features in your ETL solution? In fact, do you even think that they have a place in such a library?

Common ones I've written recently are things like...

  • Remove commas from numbers

  • Trim and convert empty string to null

  • Reformat UK postcodes

  • Make title case

  • Remove blank rows

  • Remove repeated column headers in data

  • Derive title from name and drop into column 'n'

  • Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups)

  • Format dates

  • Remove illegal dates

You get the idea :-)

C-J Berg
07/20/2007 08:16 PM by
C-J Berg

There are many other open source ETL tools that you might want to have a look at first, for instance:

http://kettle.pentaho.org/

http://www.cloveretl.org/clover-etl/

http://www.ketl.org/

http://www.enhydra.org/tech/octopus/index.html

http://www.glassfishwiki.org/jbiwiki/Wiki.jsp?page=ETLSE

http://scriptella.javaforge.com/

Some of these are quite powerful. Not as fun as writing your own tool/DSL of course, but great if you want to abandon SSIS quickly. :-)

Comments have been closed on this topic.