Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

@

Posts: 5,949 | Comments: 44,544

filter by tags archive

Rhino ETLThinking about Joins & Merges


Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
	on: 
		Left.Id.ToString().Equals(Right.UserId)
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...

More posts in "Rhino ETL" series:

  1. (16 Oct 2007) Importing Data into MS CRM
  2. (13 Aug 2007) Writing to files
  3. (05 Aug 2007) Web Services Source
  4. (05 Aug 2007) Transactions
  5. (04 Aug 2007) Targets
  6. (04 Aug 2007) Aggregates
  7. (26 Jul 2007) Thinking about Joins & Merges
  8. (24 Jul 2007) First Code Drop

Comments

Derick Bailey

i'm not sure i like either of those syntax options, actually. Neither of them allows for joins across multiple tables where the join happens down the chain of tables.

for example, how would I join TableA.Field1 to TableB.Field2 to TableC.Field3? if you modify the joinTransform line like this, you can accomplish what I am saying with multiple joinTransforms, though:

joinTransform Source to UsersAndOrganizations:

I also think it would be hard to have complex joins, like Or clauses, with those two syntax specs. you would need something closer to real SQL, while still mainaining an object model

so the final syntax would look something for multiple join chain example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

Left.Row.UserName = Right.Row.FullName

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

joinTransform UsersAndOrganizations to Departments

Left.Row.UserId = Right.Row.ManagerId

and syntax for complex join example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

( Left.Row.UserName = Right.Row.FullName || Right.Row.FullName = null)

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

I left the ".Row." in the Left and Right syntax so that you can add something like

Left.Row.NonMatching

or other specialized syntax options, as needed.

Ayende Rahien

Derick, I am not limited to the alternative I have brought up, I am certainly opened to new stuff

Mark Monster

Hmm good start I think. What about a multi-key (I'm not sure about the English term) join?

Ayende Rahien

on:

    Left.Id.ToString().Equals(Right.UserId) && Left.UserName == Right.Name

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The RavenDB Comic Strip: Part III – High availability & sleeping soundly - one day from now

There are posts all the way to May 28, 2015

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    20 May 2015 - Part II – a team in trouble!
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats