Ayende @ Rahien

It's a girl

Rhino ETL: Thinking about Joins & Merges

Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
	on: 
		Left.Id.ToString().Equals(Right.UserId)
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...

Comments

Derick Bailey
07/26/2007 08:44 PM by
Derick Bailey

i'm not sure i like either of those syntax options, actually. Neither of them allows for joins across multiple tables where the join happens down the chain of tables.

for example, how would I join TableA.Field1 to TableB.Field2 to TableC.Field3? if you modify the joinTransform line like this, you can accomplish what I am saying with multiple joinTransforms, though:

joinTransform Source to UsersAndOrganizations:

I also think it would be hard to have complex joins, like Or clauses, with those two syntax specs. you would need something closer to real SQL, while still mainaining an object model

so the final syntax would look something for multiple join chain example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

Left.Row.UserName = Right.Row.FullName

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

joinTransform UsersAndOrganizations to Departments

Left.Row.UserId = Right.Row.ManagerId

and syntax for complex join example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

( Left.Row.UserName = Right.Row.FullName || Right.Row.FullName = null)

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

I left the ".Row." in the Left and Right syntax so that you can add something like

Left.Row.NonMatching

or other specialized syntax options, as needed.

Ayende Rahien
07/26/2007 09:21 PM by
Ayende Rahien

Derick, I am not limited to the alternative I have brought up, I am certainly opened to new stuff

Mark Monster
07/27/2007 11:16 AM by
Mark Monster

Hmm good start I think. What about a multi-key (I'm not sure about the English term) join?

Ayende Rahien
07/27/2007 11:19 AM by
Ayende Rahien

on:

    Left.Id.ToString().Equals(Right.UserId) && Left.UserName == Right.Name
Comments have been closed on this topic.