Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,470

filter by tags archive

Rhino ETLThinking about Joins & Merges

time to read 1 min | 174 words

Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
	on: 
		Left.Id.ToString().Equals(Right.UserId)
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...

More posts in "Rhino ETL" series:

  1. (16 Oct 2007) Importing Data into MS CRM
  2. (13 Aug 2007) Writing to files
  3. (05 Aug 2007) Web Services Source
  4. (05 Aug 2007) Transactions
  5. (04 Aug 2007) Targets
  6. (04 Aug 2007) Aggregates
  7. (26 Jul 2007) Thinking about Joins & Merges
  8. (24 Jul 2007) First Code Drop

Comments

Derick Bailey

i'm not sure i like either of those syntax options, actually. Neither of them allows for joins across multiple tables where the join happens down the chain of tables.

for example, how would I join TableA.Field1 to TableB.Field2 to TableC.Field3? if you modify the joinTransform line like this, you can accomplish what I am saying with multiple joinTransforms, though:

joinTransform Source to UsersAndOrganizations:

I also think it would be hard to have complex joins, like Or clauses, with those two syntax specs. you would need something closer to real SQL, while still mainaining an object model

so the final syntax would look something for multiple join chain example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

Left.Row.UserName = Right.Row.FullName

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

joinTransform UsersAndOrganizations to Departments

Left.Row.UserId = Right.Row.ManagerId

and syntax for complex join example:

joinTransform Source to UsersAndOrganizations:

Left.Row.Id = Right.Row.UserId

( Left.Row.UserName = Right.Row.FullName || Right.Row.FullName = null)

transform:

    Row.Copy(Left)

    Row.OrgId = Right["Organization Id"]

I left the ".Row." in the Left and Right syntax so that you can add something like

Left.Row.NonMatching

or other specialized syntax options, as needed.

Ayende Rahien

Derick, I am not limited to the alternative I have brought up, I am certainly opened to new stuff

Mark Monster

Hmm good start I think. What about a multi-key (I'm not sure about the English term) join?

Ayende Rahien

on:

    Left.Id.ToString().Equals(Right.UserId) && Left.UserName == Right.Name

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - about one day from now
  2. The design of RavenDB 4.0: Making Lucene reliable - 3 days from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 4 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 5 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 6 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016

RECENT SERIES

  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats