Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,605
|
Comments: 51,239
Privacy Policy · Terms
filter by tags archive

DDD in Hebrew

time to read 2 min | 228 words

After reading Evan's Domain Driven Design book, I tried to develop software using the concepts describe in the book.

The problem is that the langauge that the client / spec talks and the langauge that I have to use to write the product are radically different. It is not just technical vs. business jargon. It is a different language with concepts that often cannot be directly translated.

Transliterating is a Bad Thing to do, and I speak from a lot of very painful experiance.

It gets to be more interesting when the business guys are using terms that has a direct technical meaning in English but not in Hebrew. One of the entities in the domain would be directly translated to EmployeeBase, and it is not the base class for the Employee class, the concept is actually very different

I managed to find reasonable alternative for most of the names (in the case above, EmployeeTemplate), but any question / talk about the code / model means that we are using both terms (in both langauges, actually) in order to talk about the same thing. Talk about confusing.

Hebrew (and Arabic) has the unique characteristic of being written right to left, so trying to write code in Hebrew is not practical (Although I would like to see anyone who can come up with a coherent system in Hebrew beyond the Hello World samples).

time to read 1 min | 150 words

It seems that every time I turn around I hear programming being compared to construction, engineering, precise sciences, etc. My view on the subject is quite different. I think that most of the industry is fooled to think that there are some hard rules about computers, than the field is something that belongs to engineering. (And it is probably better to say, I'm a software engineer than a software artist).

Nevertheless, art is just as precise and methodolical as programming, not for nothing a high precentage of programmers are musicians, and just ask your common painter just how mush science goes into making a good painting. I see a lot humane things that affect software projects than a technical ones. And the final result isn't some cold and metallic thing. It's something that you put quite a bit of creativity in.

time to read 2 min | 210 words

Today I spent quite a while chasing caching bugs. The reason turned out to something like this:

public override void GetHashCode()
{
  return date.GetHashCode() & name.GetHashCode() & another.GetHashCode();
}

The issue turned out to be that in most cases, the GetHashCode() returned the same value, which completely broke caching support. (For example, assume that date is MinValue, in this case, the result of this method will always be false. Using the bitwise-and actually caused us to lose uniqueness. (The same would happen for bitwise-or, by the way.)

My usual method of operation is simply to add them together, this way I at least doesn't lose uniqueness. It is possible to get duplicates using additions, but it is usually much rarer (in the previous case, we saw about one in three duplicated).

time to read 3 min | 567 words

When I start using a new database, I usually run the following queries to find how good or bad the project is going to be:

select count(*) as [Number Of Tables] from information_schema.tables

Anything bigger than a couple of hundreds, and I start to feel really nervous. Ideally, it is around 20 - 50 main ones, and maybe additional dozen tables for constants. I have worked on databases where then numbers are in the thousands (and yes, I did touch every table)

select table_name, count(*) as [Number Of Columns In Table] from information_schema.columns

group by table_name

order by count(*) desc

If you see a table with more than 25 - 30 columns, this is a sign of a big problem. Either the database is not normalizied, the data model is completely wierd, or you are looking into a "the database in a single table".

The following is my number one criteria for the quality of the database:

select column_name, count(*) [Number Of Columns Named This Way] from information_schema.columns

where data_type = 'DATETIME'

group by column_name

order by count(*) desc

The important things to note here are:

  • The number of rows returned, the more there are, the worst you are.
  • The commonly named columns are very important. I often see things (in Hebrew) that looks like this: Taarich / Taarih / Tarich / Taaizh / Taar - Different ways to transliterate Date from Hebrew to English. When I see those prolifer, I know that I am in trouble.

A variant of the above is to run it over all columns regardless of datatype, and check for common names, but I found that focusing on the dates is fairly accurate.

time to read 2 min | 358 words

I have been talking vaguely about time dependant stuff for quite a while, and I figure that it is about time to talk about some of the problems in a more concrete way. One of the things that I work on are time and hierarchy depednant rules.

For instnace, take the following rules structure:

(Image from clipboard).png

All employees are allowed 3 paid lunch breaks, but managers are allowed 4. For the week of 10/09 - 16/09, Bob (a manager) got a benefit of as many paid lunches as he wants. (Yes, a silly example, but it shows the point.)

Now, let us say that we run a validation checks at the end of the month and find out that Bob reported of:

  • 0 lunches on the week of 01/09 - 02/09
  • 3 lunches on the week of 03/09 - 09/09
  • 6 lunches on the week of 10/09 - 16/09
  • 5 lucnhes on the week of 17/09 - 23/09
  • 4 lunches on the week of 24/09 - 30/09

The results for this validation should are:

  • 01/09 - 02/09 - OK
  • 03/09 - 09/09 - OK
  • 10/09 - 16/09 - OK
  • 17/09 - 23/09 - Error, allowed 4 but had 5
  • 24/09 - 30/09 - OK

How would you go about implementing such a thing?

I will discuss my implementation tomorrow.

time to read 2 min | 294 words

I just read a comment that really annoyed me, talking about the lack of need for caching.

These days DBMS are not just a “persistent storage”; a DBMS is designed to handle many concurrent hits and transaction processes using multi CPU support and terabytes of memory pages. I don’t understand why we developers are so obsessed with the number of hits.

Just to note, I have caused a 32Gb, 64bits, 4 CPU Server to weap tears of shame when I run 900 concurrent queries traversing just under Billion rows. It worked, but it wasn't pretty. That, however, is not a core scenario in most cases.

What is a core scenario is the time that responsiveness of the application, how long it takes to serve a single request. Here, it doesn't matter how smart the database is. The deciding factor is the amount of queries performed, each of them, to remind you, is a remote call, which is easily a hundred times more expensive than anything that you might do locally.

The problem with the comment above is that it is also ignoring several other factors, including the query complexity, the number of concurrent connections, how much IO and computing each query takes, etc. I think that it is safe to say that it is at least an order of magnitude harder to scale the database (up and wide) than to scale the application, so anything that reduce the amount of work the database need to do is a Good Thing.

time to read 4 min | 772 words

I recently talked quite a bit about caches in NHibernate, and I am a great believer in careful use of it in order to give an application much better performance. Frans Buoma, however, does not agree. Just to note, Frans is the author of LLBLGen Pro.

First, let me point to an issue that I have with the terminology that he uses. When Frans is talking about cache and uniquing, he refers to a term generally (at least by N/Hibernate & Fowler) called Identity Map.

Frans:

 A cache is an object store which manages objects so you don't have to re-instantiate objects over and over again, you can just re-use the instance you need from the cache.

Fowler:

Ensures that each object gets loaded only once by keeping every loaded object in a map.

When speaking about the advantages of an Identity Map, performance is almost never the first reason to use it. It is a side benefit, which can have a certain affect, but it is not the main reason for that. If we consider Frans' arguments as they apply to Identity Map, I agree. If nothing else, Identity Map tends to be fairly short lived and limited in scope in most cases, so it doesn't have the chance to be of great effectiveness.

But an OR/M has an opportunity to cache much more than just at the session / context level. A word of warning, though, as was mentioned in the post, Caching by its very nature means that you are not seeing the very latest data. You can use cache invalidation policies (including the new data driven cache invalidation policites in .Net 2.0) to help, but you should be aware of this issue.

However, when we consider the common scenarios, it is not often that we need to have real time information. The case than Frans is presenting is a CRM application with a query on all the customers that has more than 5 orders in the last month.

Do we really need this data at real time? Or can we be satisfied with data from several minutes ago? This question is dependant on the business scenario, but fairly often the answer is that we can be reasonably satisfied with a data that is a few minutes or hours behind the real events.

Even if we would like to get real time data, the data can be changed between the time that we queried it and the time that we displayed it, so we would need to query again as soon as we finished displaying (or maybe at the same time as), ad infinitum.

Given that we assume that the business requirements allows us to use caching, this has tremendous benefit perfromance wise. Let us assume that we have cached the query and its results (again, I'm using NHibernate as the model here, and its caches are not caching live entities, but rather their values), we can then satisfy the query entirely from the cache (which usually mean in-proc memory).

The only real cost of the query is several hash table lookups, which are (by their nature) very fast, and constructing the objects, which I already shown to be highly efficent. The end result is that we can serve the results immediately. In many cases, even a cache that is valid for a few minutes can significantly reduce the amounts of queries that the DB has to process.

The concerns that Frans is raising are valid in the context* that he is talking about, but I disagree that caches are not extremely improtant to performance. That said, they should not be over used, and the DB is still the one and only authoritive source for the data. I have seen some places where the requirement is to run the application entirely from cache, without touching the database at all.

This is taking this way too far...

* Do you get the joke here?

time to read 4 min | 637 words


His Majesty's Dragon (Temeraire, Book 1)

Throne of Jade (Temeraire, Book 2)

Black Powder War (Temeraire, Book 3)

As a testement to the quality of these book, I would like to mentioned that I read all three between firday 18:00 and sunday 05:30. I truely enjoy fantasy / science fiction / alternative history books, and this trilogy hits all the right spots.

The story takes place in the beginning of the 19th century, and tells of a world different than ours only by the existance of dragons. No magic, no mambo jumbo, just dragons. Of course, you can't say "just" dragons, and Naomi Novik does an excellent job in creating more than a story, but creating a full, living, breathing world.

I read quite a bit about that time in history (the Napolean Wars are the background for the books, by the way), and the way they are portraited in the books are very close to what really happened, and likely to stick with me more than the real history.

The dragons are fitted into the story with marvelous care, seamlessly woven into the fabric of the world. I liked the first book the most, since it had the  most action, but the other two are very good as well.

The only half way annoying thing about the books is that the third one ends with a lot more stuff that I want to know about this world. How would the war end? Will the heroes be able to escape the clutches of the beauricrats, etc...

I surely hope that there is going to be more books in this world.

Highly recommended.

FUTURE POSTS

  1. Scheduling with RavenDB - 8 hours from now

There are posts all the way to Sep 18, 2025

RECENT SERIES

  1. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  2. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  3. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}