Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,475

filter by tags archive

Test driving Rhino.ETL

time to read 1 min | 177 words

Here is the first test:

[Test] 
public void EvaluatingScript_WithConnection_WillAddDataSourceToContext() 
{ 
	EtlConfigurationContext configurationContext = EtlContextBuilder.FromFile(@"Connections\connection_only.retl");
	Assert.AreEqual(3, configurationContext.Connections.Count, "should have three connections");
}

There is quite a bit of information just in this test, we introduced the EtlConfigurationContext class, decided that we will create it from a factory, and that we have something that is called a connection. Another decision made was the “retl” extension (Rhino ETL), but that is a side benefit.

The source for this is:

Connection( 
	"Northwind",
	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;",
	ConcurrentConnections: 5
	)
	
Connection( 
	"SouthSand",
	ConnectionType: OracleConnection,
	ConnectionStringName: "SouthSand"
	)

Connection( 
	"StrangeOne",
	ConnectionType: OracleConnection,
	ConnectionStringGenerator: { System.Environment.GetEnvironmentVariable("MyEnvVar") }
	)

You may have wondered about the last one, what does this do? Well, it allows you to do runtime evaluation of something, in this case, it get the value from an env-var, but that has a lot of potential. Here it a test that demonstrate the capabilities:

[Test]
public void DataSources_ConnectionStringGenerator_CanUseEvnrionmentVariables()
{
	Environment.SetEnvironmentVariable("MyEnvVar","MyExpectedValue");

	Assert.AreEqual(
			"MyExpectedValue",
			configurationContext.Connections["StrangeOne"].ConnectionString
	);

	Environment.SetEnvironmentVariable("MyEnvVar", "2");

	Assert.AreEqual(
			"2",
			configurationContext.Connections["StrangeOne"].ConnectionString
	);

}

Comments

Adam Tybor

Can't wait... whens it going in the repo?

Tobin Harris

Looks sweet, is that so you can redirect sources & targets at run time?

I haven't really dug into it, but I was scanning the ActiveWarehouse Ruby project the other day which has it's own way of dong things. Might be good for inspiration!

http://activewarehouse.rubyforge.org/etl/. 

It uses control files to orchestrate sources, transforms etc much like your own does. See example:

http://viewvc.rubyforge.mmmultiworks.com/cgi/viewvc.cgi/etl/trunk/test/delimited.ctl?root=activewarehouse&view=markup

I'm working on an ETL project right now, which was started by someone else as a bespoke VB.NET solution. I asked him if he's considered using any ETL tools such as SSIS or Kettle etc, but he didn't feel that they gave him enough control. I think a lot of developers feel this way. I only used SSIS briefly, but found that it encouraged far too much mouse clicking, and that annoys me because it's not easily automated. Well, it probably is, but I didn't hang around long enough to find out how!

Anyway, I won't rant on. I'I look forward to giving the Ayende flavour of ETL a spin!

Tobin Harris

Oh, forgot to ask. Will you be including any "common" data cleaning features in your ETL solution? In fact, do you even think that they have a place in such a library?

Common ones I've written recently are things like...

  • Remove commas from numbers

  • Trim and convert empty string to null

  • Reformat UK postcodes

  • Make title case

  • Remove blank rows

  • Remove repeated column headers in data

  • Derive title from name and drop into column 'n'

  • Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups)

  • Format dates

  • Remove illegal dates

You get the idea :-)

C-J Berg

There are many other open source ETL tools that you might want to have a look at first, for instance:

http://kettle.pentaho.org/

http://www.cloveretl.org/clover-etl/

http://www.ketl.org/

http://www.enhydra.org/tech/octopus/index.html

http://www.glassfishwiki.org/jbiwiki/Wiki.jsp?page=ETLSE

http://scriptella.javaforge.com/

Some of these are quite powerful. Not as fun as writing your own tool/DSL of course, but great if you want to abandon SSIS quickly. :-)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - 10 hours from now
  2. The design of RavenDB 4.0: Making Lucene reliable - about one day from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 2 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 3 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 4 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016

RECENT SERIES

  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats