Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,565
|
Comments: 51,184
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 748 words

I said that I would post about it, so here is the high level design for generic implementation of natural language looking parsing. Let us explore the problem scenario first. We want to be able to build this language, without having to build a full blown language from scratch:

open http://www.ayende.com/
click on link to Blog
click on link to first post
enter comment with name Ayende Rahien and email foo@example.org and url http://www.ayende.com/Blog/
enter comment text This is an awesome post.
click on submit
comment with This is an awesome post should appear on page

And to prove that we are not focusing on a single language, let us try this one as well:

when account balance is 500$ and withdrawal is made of 400$ we should get a low funds alert
when account balance is 500$ and withdrawal is made of 501$ we should deny the transaction
when weekly international charge is at 3,500$ and max weekly international charge is of 5,000$ and new charge arrives for amount 2,230$ we should deny the transaction

I think that those are divergent enough to show that the solution is a generic one.

And now, to the solution. Each type of language is going to have its own DSL engine, which know how to deal with the particular dialect that we are using. The default parsing is a three steps solution. First, split the text into sentences, then, split each sentence to tokens by whitespace. Now, for each statement, we search for the appropriate statement resolve, which is a class that knows how to deal with it. The statement resolver methods are then called to process the statement.

There are two key principal to the design. First, turning something like 'click on link' to an invocation of the ClickOnLink statement resolver and lazy parameter evaluation.

This is going to be interesting, the time right now is 19:38, and I am going to start implementing this.

It is now 22:04, and I finished the first language.

Working on the second now. It is 22:10 and I am done with the second one.

What did I do?

I took the text we had and turn that into executable commands. Now, this isn't flexible at all. If you make a modification in the way it is structured, it will fail, coming back to why natural language is a bad choice here, but it had quite a bit of flexibility in it.

You can get the code for this, including tests, here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/experiments/natrual-language

But let us talk for a bit about how this is implemented. I'll show the bank example, because it is easier.

We start by defining the BankParser, which looks like this:

image

The bank parser merely define what the statement resolvers are, and any special parsers that are needed (in this case, we need to handle dollar values).

A statement parser is trivial:

image image

 

And yes, those are pure POCO classes.

The whole idea here was that I can implement some smarts into the default engine about how it recognize methods and resolve parameters. I will admit that overloading caused some issues, but I think that this is pretty simple implementation.

It also does a good job in demonstrating the problems in such a language. Go ahead and try to build operator precedence into it. Or implement an if statement. You really can't, not without introducing a lot more structure into it. And that would turn it into yet another programming language.

What about the tooling? Intellisense and syntax highlighting?

Well, since we have the structure of the code, and we know the conventions, you shouldn't have a problem taking my previous posts about this and translating them directly into supporting this.

And yes, I can create a language in this in a few minutes, As BankParser has proven.

time to read 2 min | 396 words

Jeremy Miller is looking for a DSL that reads like natural language. My immediate response was that it is not practical, because I assumed he wanted very natural language, which is still not possible to do without extremely high budget. Limiting the problem to just reads like a natural language reduce the problem space significantly.

I am going to have a separate post about how to actually solve such a problem, but for now, I want to talk about the actual requested solution. I think it is 100%solvable with a low cost approach. That is, you can get a DSL that reads like English in under an hour. But I don't think it is valuable.

English is a terrible language to express instructions in. Any natural language is terrible in expressing instructions, just find the nearest army sergeant, they will tell you that.

Let us take a look at a language that actually took this approach:

tell application "Finder"   
set the percent_free to ¬ (((the free space of the startup disk) / (the capacity of the startup disk)) * 100) div 1
end tell
if the percent_free is less than 10 then tell application (path to frontmost application as text)
display dialog "The startup disk has only " & the percent_free & ¬
" percent of its capacity available." & return & return & ¬ "Should this script continue?" with icon 1 end tell
end if

This is apple script. From my point of view, this is horrible. It is unreadable in the extreme. More than that, trying to explain how this language works, or how it handles error is a non trivial task.

This has been my experience any time I actually tried to create a natural language like syntax. It is too complex, and users get annoyed when they can't use real natural language.

From my perspective, getting an expressive DSL does not means that it has to read like an English statement. In fact, it probably shouldn't. Too much noise involved. A structured approach isn't just to help the compiler, but to help the reader.

time to read 8 min | 1445 words

This is strongly related to my posts about Tools Matter. During the last six months had several conversations with people about their Xyz processes. In several cases, my response was a polite version of: "This Is Broken, Badly" and "You should automate this part".

I am a great believer in automating just about everything that move (and if it doesn't move, kick it until it does!). In those conversation, the response was usually, "Yeah, we thought about doing that, but Abc does things in a way I don't like and Efg isn't compatible with our Foo requirements". And that was that.

Let us take deployment as good example of that. I was talking with someone about the need for automatic deployment, and he mentioned that he is waiting for a tool to come up that will also handle workflows.

I was a bit stunned by that, and inquired deeper, at which point it became clear that the guy was working in a highly regulated environment and doing a deployment involved multiple people authorizing it in different environments before it could go live. Because of all the manual work that already exists there, which cannot be changed for regulatory reasons, they have no automated deployment.

Note, it sounds much worse than it actually is, now that I re-read this.

I was critical on this approach, for several reasons. First, even if you can't go all the way, just having a build script that you have to manually run is a huge improvement. Next, I asked several additional questions bout the scenario, and it turned out that the process was something like this:

  • We need to push something to production
  • We deploy to a test server and ask QA to test that
  • Once QA sign off this release, deploy to a staging server
  • Get at least three business experts to smoke test the system
  • Once we have 3 signatures that authorize the system, we can ask Joe (the friendly IT admin) to deploy to production
  • Joe schedule a time for deployment and get signoff for that from someone with the authorization to do so.
  • At the specified time, Joe is going to deploy to production, the dev team is on call for issues there

This isn't a good going-to-production scenario, with multiple check points to ensure that we are safe & sound. The real reason is not to actually ensure quality, of course, it is to satisfy some dry regulation and have an audit trail that you can point to. But that is beside the point and shows my utter annoyance with all forms of bureaucracy.

Okay, so we have this process that we must go through in order to get something to production. There is no tool out there that will do it for us and give us the required audit trail. Therefor, we can't use automatic deployments.

My response for that was rude and unprintable.

Here is the deal, let us estimate the cost of building such a system:

  • A page into which I can enter request to go to production. This consists of two text boxes and a submit button. On submit:
    • Automatically deploy to the test server
    • Send an email to me and the QA department that we have something that they need to test
    • Record that I have started a deployment process
  • A page into which the QA department can say if they authorize the build or not. On submit:
    • If not:
      • record this fact
      • email to dev team
    • If yes
      • Record this fact
      • Automatically deploy to staging server
      • Email all the people that can approve a build and ask them to evaluate the build
  • A page into which the business experts can authorize or block a build. On submit:
    • If not:
      • record the reason
      • email to dev team and QA
    • If yes:
      • Record this fact
      • If this is the third person to authorize this build (and if there are no blocks):
        • record this fact
        • Send email to Joe, asking him to setup time for deployment
  • A page for Joe to enter time for going to production, on Submit:
    • record this fact
    • Send email to whoever it is that can authorize production downtime
    • Generate deployment package (which Joe will run in production)
  • A page for authorizing scheduled downtime:
    • Record this fact
    • Email Joe that the time is approved
    • Email whoever is interested that there will be scheduled downtime at that time

Five pages, more or less. And yes, I am glossing over things, I know. That is not the point.

If it takes over a week to build this I would be very surprised. The benefit is that we have a more streamlined process, we no longer have to babysit multiple manual deployments and Joe doesn't get some word document with instruction as to how to deploy. He gets a deployment package that he can copy to production and double click in order to deploy this.

Let us take another scenario. Deploying to production often fails because of one problem or the other, usually the IT admins who performed the install gave bad values to the build script. (Such as specifying the wrong connection string, or have a typo in some URL, or something of this nature). The second time that this has happened, it should be caught by the build script itself. The response I got when I expressed this opinion was that they had no control over the build process, that it was entirely the realm of the IT administrators.

Let us take the most difficult scenario that I can think of. We are required to hand to the IT admins the compiled binaries along with a document that specify what new values we should put in the configuration.

My approach for this would be to put an if statement in the application startup, which will perform a full environment check (the idea was stolen from Jeremy Miller and Release It!, by the way) and give a detailed error message. Since this is likely to be a long process, it will disable itself the first time it passes successfully (I leave the how as an exercise for the reader, consider that it should be reset the next time we deploy).

The tool to do that is code, your code. Which you built in order to provide you with the foundation for your project.

In short, because I have another half dozen examples that are as applicable, remember, you are a developer. If your tools doesn't provide you with what you need, you can build it. And since you are not going to try to build a generic tool, the cost of doing it is extremely low.

It doesn't even have to be a tool, just create a console application that does something, where you hard code everything. Let the compiler be your tool, and "configure" it with code.

Don't wait, act.

time to read 2 min | 321 words

imageIf you don't have an automated deployment, it generally means that you are in a bad position. By automated, I mean that you should be able to push a new version out by double clicking something. If you can't get automated deployment script in under an hour, you most certainly have a problem.

Sometimes, the problem is with the process, you don't have the facilities to do an automated deployment because parts of the deployment is sitting in people's head (oh, you need to configure IIS to use Xyz with the new version), in other cases, it isn't there simply because people haven't tried.

Yet, automated deployment is one of those things that you can create in isolation, without getting commitment or support from the rest of the team. This is usually the first thing that I do in any project with existing codebase that I come to.

It is also a good way of taking care of problems in the process. If you have a hard time deploying because you database change management process is broken, you need to fix that before you can get an automatic deployment ready.

Also, notice that I am explicitly talking about automated deployment, not about having a build script. One of the requirements for automated deployment is a build script, but that is just one of them.

I don't care that you can or can't build the software, I care that you can deploy this successfully. And yes, this include doing things like deploying to several machines, stopping and starting services, updating the database schema and applying any data migration processes, and even doing rolling update, if this is a requirement.

Remember, automated.

And I'll leave you with just one final thought: Prayer should not be part of the steps in the deployment process.

time to read 2 min | 268 words

Today I implemented refactoring support for a DSL. Basically, it is Extract Business Condition, and it was explicitly modeled after the way R# handles Extract Variable. It even share the same shortcut, ctrl+alt+v.

I also took a stub in implementing automatic pattern recognition, so when the system recognize a common usage pattern, it will automatically refactor it to a high level abstraction. It works, although I think that I can make it even more flexible than it is now.

Now, if someone from the Resharper team is actually reading this, they would know that I am lying. There is no way of doing something of this magnitude in just one day, not even if you have an extremely helpful compiler. And they would be right. I didn't try to tackle a feature of this magnitude. What I did do was to find the most common scenario for this feature and nail that.

I am taking this approach explicitly and deliberately. With the end result that I get to show value very rapidly. And yes, the customer is made aware of the limitations of this approach. I also tell them that they can get the feature by tomorrow, with error message if they are trying to do something that is not supported.

Trying to support 100% is hard, trying to support just 20% turn out to be (not quite) easy. And now you get to nitpick it to death, I won't respond for about a day, since I am just about to board a flight.


time to read 1 min | 147 words

I am going to give a workshop or two at the ALT.Net Austin in the end of October. Those will be free (as in beer) and will be recorded & available on the net afterward. Right now I want to do on on writing DSLs, but I have another which is basically blank at the moment. I have too many subjects that I can talk about, and too many levels at which I can talk about them.

So, this is your chance to help me. If you are going to be there, what would you like to have a workshop about?

And no, a question like NHibernate is not acceptable, it is  too broad. Are we talking about NHibernate best practices, high scalability, tips and tricks or advance usages. I can do a three hours workshop on any of them.

Suggestions?

time to read 5 min | 801 words

The problem as it was stated was of rules that looked like this:

upon bounced_check or refused_credit:
	if customer.TotalPurchases > 10000: # preferred
		ask_authorization_for_more_credit
	else:
		call_the cops 

upon new_order:
	if customer.TotalPurchases > 10000: # preferred
		apply_discount 5.precent 

upon order_shipped:
	send_marketing_stuff unless customer.RequestedNoSpam 

I don't like it, and the reason isn't just that we can introduce IsPreferred.

I don't like it because the abstraction facilities here are poor. We have basically introduced events and business rules, maybe with a sprinkling of a domain model, but nothing really meaningful. Such system will die under their own weight in any situation of significant complexity (in other words, in all real world situations).

Let us consider the problem in reverse, shall we? We have various conditions and actions upon which we can act. But the logic is scattered all over the place, making it hard to read, modify, understand and work with. When such a system compose of the lifeblood of the business, the business usually adapts, and starts to talk in the terms of the system. However, they tend to lose the ability to think about things in way that would be more meaningful.

I listened today to a business person trying to explain some concept that he wanted to make. It took him several tries to explain the business problem because he was focused on the technical one. The system has a corrupting affect on it. I call this the Babel Syndrome, the reverse of DDD's ubiquitous language.

Let us see if we can get a high level of meaning out of the above DSL, shall we? First, we restate our problem, instead of dealing with events and conditions for responding the events, we deal with business responses for scenarios. It doesn't sound like much of a difference, but in actuality, there is a big difference between the two.

The most important of those differences is the change from handling the events to handling a business scenario in a given context. In other words, instead of asking what we should do when a check is bounced, we need to ask a totally different question. "When the customer is preferred, what should the response be for bounced check?"

This is anything but a minor change in the the way we think about the language and how we operate on it. Let us see the DSL script, after which we can discuss how it affects us. These are the contents of the default.boo file:

upon order_shipped:
	send_marketing_stuff unless customer.RequestedNoSpam

upon bounced_check or refused_credit: 
call_the cops

This will be executed for all orders, like before. Now, let us look at preferred_customer.boo, and what concepts it express.

when customer.TotalPurchases > 10000 # preferred

upon new_order:
	apply_discount 5.precents

upon bounced_check or refused_credit:
	ask_authorization_for_more_credit

And now we are getting to see some of the more interesting parts of the difference. We are now talking in terms of a business scenario. When we have a preferred customer, and something happen, how should we respond?

This change is a well known refactoring: conditional to polymorphism. In other words, we just created the strategy pattern with a DSL. The difference here is that the script have an active role in deciding whatever it can deal with the scenario or not (in other words, chain of responsibility, and the pattern I am going to mention).

When we need to handle some business scenario, we are going to execute all the scripts, with the default.boo being the last one to run. If any of the scripts accepted the scenario as valid and has specific action to take, it has the option to do so.

Enough about the implementation, let us go back to the concepts. We can make now talk to the business people in a way that is far more concise and natural. Instead of having to focus on all permutations of a possible event, we can now talking about a specific scenario and how we handle the business event in that context. Not only is this more readable, it is easier by far to actually define such things as what is the meaning of a preferred customer. I can open the DSL and actually read it.

Similar approaches are very useful when you recognize that the code is asking to be given a more explicit shape than just generic rules. Don't let your DSL be whatever you started with. Find and actively extract higher level meanings whenever it is possible.

A deeper examination of this DSL, how to build and use it is likely to compose most of chapter 13, as a real world example of a complex DSL. Who do you think?

Given this approach, how would you design an offer management DSL?

time to read 1 min | 174 words

I was having a discussion today about the way business rules are implemented. And large part of the discussion was focused on trying to get a specific behavior in a specific circumstance. As usual, I am going to use a totally different example, which might not be as brutal in its focus as the real one.

We have a set of business rules that relate to what is going to happen to a customer in certain situations. For example, we might have the following:

upon bounced_check or refused_credit:
	if customer.TotalPurchases > 10000: # preferred
		ask_authorizatin_for_more_credit
	else:
		call_the cops

upon new_order:
	if customer.TotalPurchases > 10000: # preferred
		apply_discount 5.precent
upon order_shipped:
send_marketing_stuff unless customer.RequestedNoSpam

What is this code crying for? Here is a hint, it is not the introduction of IsPreferred, although that would be welcome.

I am interested in hearing what you will have to say in this matter.

And as a total non sequitur, cockroaches at Starbucks, yuck.

time to read 2 min | 317 words

This is a note to myself, because I don't have the time for a proper post. When you are dealing with a DSL that contains more than just a few scripts, you really being to care about compilation times. Even with caching, this can be a problem.

The solution is the same that we have been using for the last three to four decades, don't compile if the source hasn't changed.

The code to make this happen using Rhino DSL is here:

public override CompilerContext Compile(string[] urls)
{
    var outputAssemblyName = OutputAssemblyName(urls);
    if (CanUseCachedVersion(outputAssemblyName, urls))
        return new CompilerContext { GeneratedAssembly = Assembly.Load(File.ReadAllBytes(outputAssemblyName)) };
    return base.Compile(urls);
}

private bool CanUseCachedVersion(string outputAssemblyName, string[] urls)
{
    var asm = new FileInfo(outputAssemblyName);
    if(asm.Exists==false)
        return false;
    foreach (var url in urls)
    {
        if(File.GetLastWriteTime(url) > asm.LastWriteTime)
            return false;
    }
    return true;
}

And in the CustomizeCompiler method:

protected override void CustomizeCompiler(BooCompiler compiler, CompilerPipeline pipeline, string[] urls)
{
    compiler.Parameters.OutputAssembly = OutputAssemblyName(urls);
    // add implicit base class here...
    if (pipeline.Find(typeof(SaveAssembly)) == -1)
        pipeline.Add(new SaveAssembly());
}

It is impossible to overstate how big a difference this can make.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}