Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 72

filter by tags archive

Rhino ETLThinking about Joins & Merges

time to read 1 min | 174 words

Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
	on: 
		Left.Id.ToString().Equals(Right.UserId)
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...

More posts in "Rhino ETL" series:

  1. (16 Oct 2007) Importing Data into MS CRM
  2. (13 Aug 2007) Writing to files
  3. (05 Aug 2007) Web Services Source
  4. (05 Aug 2007) Transactions
  5. (04 Aug 2007) Targets
  6. (04 Aug 2007) Aggregates
  7. (26 Jul 2007) Thinking about Joins & Merges
  8. (24 Jul 2007) First Code Drop

Comments

Derick Bailey

i'm not sure i like either of those syntax options, actually. Neither of them allows for joins across multiple tables where the join happens down the chain of tables.

for example, how would I join TableA.Field1 to TableB.Field2 to TableC.Field3? if you modify the joinTransform line like this, you can accomplish what I am saying with multiple joinTransforms, though:

joinTransform Source to UsersAndOrganizations:

I also think it would be hard to have complex joins, like Or clauses, with those two syntax specs. you would need something closer to real SQL, while still mainaining an object model

so the final syntax would look something for multiple join chain example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

Left.Row.UserName = Right.Row.FullName

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

joinTransform UsersAndOrganizations to Departments

Left.Row.UserId = Right.Row.ManagerId

and syntax for complex join example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

( Left.Row.UserName = Right.Row.FullName || Right.Row.FullName = null)

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

I left the ".Row." in the Left and Right syntax so that you can add something like

Left.Row.NonMatching

or other specialized syntax options, as needed.

Ayende Rahien

Derick, I am not limited to the alternative I have brought up, I am certainly opened to new stuff

Mark Monster

Hmm good start I think. What about a multi-key (I'm not sure about the English term) join?

Ayende Rahien

on:

    Left.Id.ToString().Equals(Right.UserId) && Left.UserName == Right.Name

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 3 hours from now
  2. Production postmortem: The industry at large - about one day from now
  3. The insidious cost of allocations - 2 days from now
  4. Buffer allocation strategies: A possible solution - 5 days from now
  5. Buffer allocation strategies: Explaining the solution - 6 days from now

And 3 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats