Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,583
|
Comments: 51,212
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 232 words

I wanted to comment to this post from Scott McMaster, where he responds to my SoC post. What caught my eye was this:

Below the surface, a lot of the linked-in discussion seems to hinge on whether the banding logic qualifies as "business logic" or "presentation logic".  For the purpose here today, I don't much care what kind of "logic" it is, but it IS sufficiently non-trivial to require unit testing.  And if you bury it inside the page markup, you will have an extremely difficult time doing that.

I don't agree, it is extremely easy to test a view in MonoRail. In this case, I would do it with something like this:

[Test]
public void ShowOrdersView_WithMoreThanTenRows_WillShowRunningTotal()
{
	List<Order> orders = new List<Order>();
	for(int i=0;i<15;i++)
	{
		orders.Add( TestGenerator.CreateOrderWithCost(500) );
	}
	XmlDocument viewDOM = EvaluateViewAndReturnDOM( "ShowOrdersView", new Parameters("orders", orders));
	int index = 1;
	int totalSoFar = 0;
	foreach(XmlNode tr in viewDOM.SelectSingleNode("//table[@id='orderSummary']/tr"))
	{
		if(index%10 != 0)
		{
			Assert.IsNotNull(tr.SelectSingleNode("td/value()=='500'"));
			totalSoFar += 500;
		}
		else
		{
			Assert.Contains(td.Children[0].InnerText, "Running Total");
			Assert.Contains(td.Children[1].InnerText, totalSoFar.ToString());
		}
		index+=1;
	}
	Assert.AreEqual(15, index, "Not enough rows were found");
}

As you probably have figured out, this is an semi-integration test, and it tests the output of the view without involving anything else. The EvaluateViewAndReturnDOM will evaluate the view and will use SgmlReader to return an XmlDocument that can be easily checked.

 
time to read 2 min | 244 words

A business platform, as far as I care, is an application that I develop on top of. SAP, Oracle Applications, CRM, ERP, etc.

Those big applications are usually sold with a hefty price tag, and a promise that if can be modified to the specific organization needs as required. That is often true, actually, but the question is how. This often requires development, and that is where this post comes in. I am a developer, and I evaluate such things with an eye to best practices I use for "normal" development. In a word, I care for Maintainability.

Breaking it down, I care for (no particular order):

  • Source Control - should be easy, simple and painless.
  • Ease of deployment
  • Debuggable - easily
  • Testable - easily
  • Automation of deployment
  • Separation of Concerns
  • Don't Repeat Yourself
  • Doesn't shoot me in the foot
  • Make sense - that is hard to explain, but it should be obvious what is going on there
  • Possible to extend - hacks are not something that I enjoy doing

A certain ERP system is extended by writing SQL code that concat strings in order to produce HTML. That fails on all counts, I am due to start working with a directly with a Platform (so far I was always interfacing with Platforms, never working with them directly) in the near future, and I intend to watch closely for those issue, if it pains me, it is time for the old "wrap and abstract" trick...

time to read 1 min | 186 words

  1. At first, there was the Utility, it was written quickly, for doing just this one small thing, and no one cared much about it.
  2. Then came the Project, which took a few weeks, and saved some work for people to do.
  3. And on the third day the Application, which had users and did useful work. It was both more complex and more valuable.
  4. From the trenches, the Batch Process appeared, to make order in the chaos.
  5. Over the horizon the Framework came into place, and all was orderly and there was order in the DAL and the BAL.
  6. Beyond the framework, a Business Framework appeared, it was sharp and focused, and it knew what a customer is, and what to do with a purchase order.
  7. To rule them all, the System was brought fourth, and it tied to all the applications in the organization, and it had a nice dashboard.
  8. To the greedy, the Platform was sold, which controlled everything, and made fun of the other things, and was extensible (with XML, of course).
  9. To make the little things easy, a utility was created...
time to read 1 min | 174 words

Well, I think that I have a solid foundation with the engine and syntax right now, I still have error conditions to verify, but that is something that I can handle as I go along. Now it is time to consider handling joins and merges. My initial thinking was something like:

joinTransform UsersAndOrganizations:
	on: 
		Left.Id.ToString().Equals(Right.UserId)
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

The problem is that while this gives me equality operation, I can't handle sets very well, I have to compare each row vs. each row, and I would like to do it better. It would also mean having to do everything in memory, and I am not really crazy about that (nor particularly worried, will solved that when I need it).

Another option is:

joinTransform UsersAndOrganizations:
	left:  [Row.Id, Row.UserName]
	right: [Row.UserId, Row.FullName]
	transform:
		Row.Copy(Left)
		Row.OrgId = Right["Organization Id"]

This lets me handle it in a better way, since I now have two sets of keys, and I can do comparisons a lot more easily.That is a lot harder to read, though.

Any suggestions?

Both on the syntax and implementation strategies...
time to read 1 min | 87 words

Today I managed to capture a screen shot of an SSIS error that had drove me crazy, and I sent it to my boss, it looked something like this one. I had the pleasure of hearing him repeating "But that is not possible" five or six times, it sounded familiar, that is what I had said when we started to run into this.

As an aside, I have create the I Hate SSIS page on my wiki, there is a impressive number of issues up there.

Production

time to read 1 min | 95 words

We just went live with our project, it wasn't really real until I saw the customer check out the site from his phone. The recent weeks has been very busy, but they were filled with either (a) SSIS curses or (b) browser comparability issues. We are ahead of schedule, and managed to push two updates from what was declared to be "ready-to-ship".

Oh, another thing I feel like mentioning, I left work early today, and yesterday. (We had a single crunch day in the entire project)

We still have stuff to do, but it is shipping!

time to read 2 min | 336 words

First, let me make it clear, it is not ready yet.

What we have:

  • 99% complete on the syntax
  • Overall architecture should be stable
  • The engine works - but I think of it as a spike, it is likely to change significantly.

What remains to be done:

  • Parallelising the work inside a pipeline
  • Better error messages
  • More logging
  • More tests
  • Transforms over sets of rows

Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).

A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.

Places to start looking at:

  • EtlContextBuilder - Compile the DSL and spits out an instance of:
  • EtlConfigurationContext - the result of the DSL, which can be run using:
  • ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.

There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.

I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.

You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL

But it needs references from the root, so it would be easiest to just do:

svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

time to read 1 min | 196 words

I have just read this post from Hammett, talking about the difference between separating business logic and presentation logic vs. separating presentation and presentation logic.  This comment has caught my eye, Nicholas Piasecki says:

To me, this discussion all boils down to one thing: the foreach loop. Let’s say you want to display a table of sales reports, but after every tenth row, you want to print out an extra row that displays a running total of sales to that point. And you want negative numbers to appear in red, positive numbers to appear in green, and zeros to appear in black. In MonoRail, this is easy; with WebForm’s declarative syntax, just shoot yourself in the face right now. Most solutions I’ve seen end up doing lots of manipulation in the code-behind and then slamming it into a Literal or something, which to me defeats the purpose of the code separation.

And that, to me, is the essence of why I dislike WebForms, something like this is possible, but very hard to do. In my current project, we have used GridViews only in the admin module, and we have regretted that as well.

time to read 1 min | 133 words

Okay, here is the full package syntax that I have now, which is enough to express quite a bit, I am now getting started on working on the engine itself, I am going to try the message passing architecture for now, since it is much more flexible.

connection( 
	"NorthwindConnection",
	ConnectionType: SqlConnection,
	ConnectionString: "Data Source=localhost;Initial Catalog=Northwind; Integrated Security=SSPI;"
	)

source Northwind, Connection="NorthwindConnection":
	Command: "SELECT * FROM Orders WHERE RequiredDate BETWEEN @LastUpdate AND @CurrentDate"
	
	Parameters:
		@LastUpdate = date.Today.AddDays(-1)
		@CurrentTime = ExecuteScalar("NorthwindConnection", "SELECT MAX(RequiredDate) FROM Orders")

transform ToLowerCase:
	for column in Parameters.Columns:
		Row[column] = Row[column].ToLower() if Row[column] isa string

destination Northwind, Connection = "NorthwindConnection":
	Command: """
INSERT INTO [Orders_Copy]
(
	[CustomerID], [EmployeeID], [OrderDate], [RequiredDate], [ShippedDate],[ShipVia],
	[Freight],[ShipName],[ShipAddress],[ShipCity],[ShipRegion],[ShipPostalCode],
	[ShipCountry]
)
VALUES
(
	@CustomerID,@EmployeeID,@OrderDate,@RequiredDate,@ShippedDate,@ShipVia,@Freight,
	@ShipName,@ShipAddress,@ShipCity,@ShipRegion,@ShipPostalCode,@ShipCountry
)
"""

pipeline CopyOrders:
	Sources.Northwind >> ToLowerCase(Columns: ['ShipCity','ShipRegion'])
	ToLowerCase >> Destinations.Northwind 
time to read 2 min | 372 words

Tobin Harris has asked some questions about how Rhino.ETL will handle transformations.  As you can see, I consider this something as trivial as a FizzBuzz test, which is a Good Thing, since it really should be so simple. Tobin's questions really show the current pain points in ETL processes.

  • Remove commas from numbers
  • transform RemoveCommas:
      for column in row.Columns:
    	if row[column] isa string:
    		row[column] = row[column].Replace(",","")
  • Trim and convert empty string to null
  • transform TrimEmptyStringToNull:
    	for column in row.Columns:
    		val = row[column]
    		if val isa string:
    			row[column] = null if val.Trim().Length == 0
  • Reformat UK postcodes - No idea from what format, and to what format, but let us say that I have "SW1A0AA" and I want "SW1A 0AA"
  • transform IntroduceSpace:
    	row.PostalCode = row.PostalCode.Substring(0,4) +' ' + row.PostalCode.Substring(4)
  • Make title case and Derive title from name and drop into column 'n':
  • transform  MakeTitleCase:
    	row.Title = row.Name.Substring(0,1).ToUpper() + row.Name.Substring(1)
  • Remove blank rows - right now, you would need to check all the columns manually ( here is a sample for one column that should suffice in most cases ), if this is an important, it is easy to add the check in the row class itself, so you can ask for it directly.
  • transform RemoveRowsWithoutId:
    	RemoveRow() if not row.Id
  • Format dates - I think you already got the idea, but never the less, let us take "Mar 04, 2007" and translate it to "2007-03-04", as an aside, it is probably easier to keep the date object directly.
  • transform TranslateDate:
    	row.Date = date.Parse(row.Date).ToString("yyyy-MM-dd")
  • Remove illegal dates
  • transform RemoveBadDate:
    	tmp as date
    	row.Date = null if not date.TryParse(row.Date, tmp)

Things that I don't have an implementation of are:

  • Remove repeated column headers in data - I don't understand the requirement.
  • Unpivot repeated groups onto new rows, Unpivot( startCol, colsPerGroup, numberOfGroups) - I have two problems here, I never groked pivot/unpviot fully, so this require more research, but I have a more serious issue, and that is that this is a transformation over a set of rows, and I can't thing of a good syntax for that, or the semantics it should have.
    I am opened for ideas...

FUTURE POSTS

  1. Production postmorterm: The rookie server's untimely promotion - about one day from now

There are posts all the way to Jun 11, 2025

RECENT SERIES

  1. Production postmorterm (2):
    02 Feb 2016 - Houston, we have a problem
  2. Webinar (7):
    05 Jun 2025 - Think inside the database
  3. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  4. RavenDB News (2):
    02 May 2025 - May 2025
  5. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}