Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 73

filter by tags archive

Rhino.ETLFull Package Syntax

time to read 1 min | 133 words

Okay, here is the full package syntax that I have now, which is enough to express quite a bit, I am now getting started on working on the engine itself, I am going to try the message passing architecture for now, since it is much more flexible.

connection( 
	"NorthwindConnection",
	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;"
	)

source Northwind, Connection="NorthwindConnection":
	Command: "SELECT * FROM Orders WHERE RequiredDate BETWEEN @LastUpdate AND @CurrentDate"
	
	Parameters:
		@LastUpdate = date.Today.AddDays(-1)
		@CurrentTime = ExecuteScalar("NorthwindConnection", "SELECT MAX(RequiredDate) FROM Orders")

transform ToLowerCase:
	for column in Parameters.Columns:
		Row[column] = Row[column].ToLower() if Row[column] isa string

destination Northwind, Connection = "NorthwindConnection":
	Command: """
INSERT INTO [Orders_Copy]
(
	[CustomerID], [EmployeeID], [OrderDate], [RequiredDate], [ShippedDate],[ShipVia],
	[Freight],[ShipName],[ShipAddress],[ShipCity],[ShipRegion],[ShipPostalCode],
	[ShipCountry]
)
VALUES
(
	@CustomerID,@EmployeeID,@OrderDate,@RequiredDate,@ShippedDate,@ShipVia,@Freight,
	@ShipName,@ShipAddress,@ShipCity,@ShipRegion,@ShipPostalCode,@ShipCountry
)
"""

pipeline CopyOrders:
	Sources.Northwind >> ToLowerCase(Columns: ['ShipCity','ShipRegion'])
	ToLowerCase >> Destinations.Northwind 

More posts in "Rhino.ETL" series:

  1. (04 Aug 2007) Status Report - Joins, Distinct & Engine work
  2. (21 Jul 2007) Full Package Syntax
  3. (21 Jul 2007) Turning Transformations to FizzBuzz tests
  4. (21 Jul 2007) Providing Answers

Comments

Tobin Harris

All looks good to me :)

I realise you're doing this to solve your own problems, not mine, but some other thoughts I've had on this are:

1) Mappings

It's common to set up a mapping for each source, which indicates how source columns map to target columns. Had you considered making this an explicit concept in your DSL?

map SkandiaFileFormatToDestinationsTable:

'Customer ID' => 'customer_id'

'Order ID' => 'order_id'

'First Name' => 'forename'

pipeline CopyOrders:

Sources.SkandiaFile >> Destinations.ValuationsTable using map SkandiaFileFormatToDestinationsTable

2) Back to the idea of having a stock library of standard cleaning functions. You could just allow file include?

pipeline ImportOrdersFromPegasus:

include "StandardTransforms.boo"

Sources.Orders >> ToTitleCase(Colums: ['Title','Forename','Surname']) >> Destinations.Orders with map PegasusOrdersToSunOrders

Food for thought, anyway!

Ayende Rahien

I realise you're doing this to solve your own problems, not mine,

I am looking forward to see what you can do with this :-)

=> is not a valid Boo operator, so I used >>, but other than that, isn't your map basically a transform of column names?

transform SkandiaFileFormatToDestinationsTable:

Row.customer_id = Row["Customer ID"]

I will probably not allow a file include, but rather a project system, which will allow better reuse (including shared dlls).

Jay R. Wren

Um. WOW is all I can say.

That is very nice short and concise language.

I have to admit that I've used C# instead of SSIS in cases where I can get away with it, but for my next case I'll be looking at these bits.

Avish

Looking good. I'm coming from a non-ORM background, so I have issues with the literal sql strings. Can't you do anything about those?

Ayende Rahien

Well, what would you have instead?

This is a ETL project, I don't see any better way to handle this, frankly.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The industry at large - one day from now
  2. The insidious cost of allocations - about one day from now
  3. Buffer allocation strategies: A possible solution - 5 days from now
  4. Buffer allocation strategies: Explaining the solution - 6 days from now
  5. Buffer allocation strategies: Bad usage patterns - 7 days from now

And 2 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats