Soliciting feedback about RavenDB 4.0 and TODOs for 4.1

time to read 1 min | 76 words

With RavenDB 4.0 out and about for a few months already, we have been mostly focused on finishing up the release. That meant working on documentation (the book is already past the 500 pages mark!), additional clients, helping clients to go to production with 4.0 and gathering feedback.

In fact, this is the point of this post today. I would really like to know your thoughts about RavenDB 4.0 and what should go into the next version?

Tweet Share Share 35 comments

Tags:

raven
design

Comments

10 Apr 2018
18:51 PM

First on the list would have to be easier migration from 3.5 to 4.X. The breaking comparability while completely understandable forces us to slow down our adoption of Raven 4.0 which we desperately need. But we have multiple major and extremely large mission critical apps that use 3.5 databases. Either client comparability by way of magic, or maybe some kind of replication that can keep 3.5 and 4.X synced.

Probably pie in the sky, but felt the need to bring it up.

11 Apr 2018
08:25 AM

Ian Cross

HM, I agree with that point although I also agree it's not really something that's probably going to get fixed. The removal of Authorization Bundle has halted RavenDB 4.0 migration for us until we have time to invent our own approach. Not easy when you don't have SecureFor(). It's manageable with injecting complex query logic and copying the RavenDB Authorization Bundle classes like AuthorizationUser, DocumentPermission but has made moving to RavenDB 4.0 (which we also desperately need) so much harder.

11 Apr 2018
09:22 AM

Andrew Davey

I'm very excited about v4, but like HM I'm currently on 3.5 and struggling to get my head around migrating to 4.0. I'd be interested to speak to anyone who has successfully migrated a large system.

11 Apr 2018
10:27 AM

Oren Eini

HM & Andrew, That is actually something that we spent a lot of time on but probably didn't make public enough. See this post: https://ayende.com/blog/182561-A/migrating-data-from-ravendb-3-5-to-4-0

11 Apr 2018
10:37 AM

Oren Eini

HM & Andrew, With regards to the client API compatability, we had made the decision to break compatability between previous versions to allow us to make some really important architecture decisions. Our experience is that migration typically takes a few days for a single developer to move most projects. The migration import process can help a lot in this regards

11 Apr 2018
10:39 AM

Oren Eini

Ian, The major problem with SecureFor is that we can't really make it perform properly with the kind of requirements that we have for 4.0. I'll be happy to discuss options for handling this properly in more depth, but there are a lot of complexity around proper implementation and we would rather not have a feature than a problematic one.

11 Apr 2018
11:59 AM

Pop Catalin

Raven 4 is looking really great so far ... I haven't dug very deep into it yet, but ...

If there is one thing I would like to see improved, it's the Raven Studio Menu navigation. the collapsing and sliding menu is very unintuitive (at least to myself). I just can't seem to get used to it. Personally, I would prefer a more simple menu that does not have unexpected interactions ( sometimes hiding the panel, sometimes not). and which allow me to chose when to hide or collapse a panel.

11 Apr 2018
13:39 PM

Oren Eini

Pop Catalin, Thanks, I added an issue for that here: http://issues.hibernatingrhinos.com/issue/RavenDB-10926

11 Apr 2018
17:29 PM

Ian Cross

I get it Oren and I'm sure you have your reasons. It's just now that complexity is passed to your loyal clients who have been paying for 4-5 years and building their platforms on top of the existing capabilities. We will manage, just need to find the time (more than a few days for one developer) to do it. You no doubt carefully considered it at the time and knew it would wind some people up. Perhaps many people just didn't use this or that bundle so the moaning from a few people is worth it to keep the system the way you want it to be

12 Apr 2018
06:15 AM

Oren Eini

Ian, You are correct in all aspects, but there might be an important wrinkle here that is overlooked. Any implementation that we will build is going to be generic. That puts a lot of complexity on the solution, since we can make any assumptions on the situation. A client's implementation can typically make assumption based on what is going on, which can drastically simplify the implementation.

We'll be happy to assist you in getting a good solution as part of the migration.

12 Apr 2018
16:38 PM

Dylan Bevan

The subscriptions feature is awesome in allowing RavenDB to operate in an event sourcing architecture. My suggestion would be to allow partitioning of readers. As it stands if I have a subscription on an index where EventName = blah, there can only be a single consumer. If I'm processing a lot of events that's quite the bottle neck (assuming that the consumer wants to do some work). Currently I plan to create a partition value on the document (someUserConstValue.hashCode % someConstNumber) and create multiple subscriptions where EventName = blah && Partition = 0, EventName = blah && Partition = 1 .... This will allow me to balance out the load by a tuneable value, however the deployment of my consumers becomes more cumbersome as I want to try and distribute the load so that a single server doesn't have 90% of the subscribers. Azure's DocumentDb PartitionReader library has a very simple (from a developer standpoint) way of doing it where each consumer will attempt to acquire a lease for a partition and renew it on a frequency. Having this out of the box in RavenDb would enable me to focus much more on my code.

12 Apr 2018
18:28 PM

Ian Cross

Yep, I accept that about generic vs specific. Thanks for the offer to help if needed - we will pull together a high-level design and run it past you once we get to that for your comments / feedback. Cheers, Ian

13 Apr 2018
03:13 AM

Padgett

First up an acknowledgement that I don’t know if this is possible on 4.0 or not. If it is - then a recipe or blog post would be welcome.

I’d love to be able to use Raven as a local data store when doing mobile development.

I see a lot of need in the mobile development space to have applications that work offline, or store data locally and then sync in the background to a remote server. This would require being able to run a RavenDb server locally in a native Android or iOS app.

13 Apr 2018
11:39 AM

Oren Eini

Dylan, The problem here is the coordination. With multiple concurrent parties, this gets really complex. For example, imagine that we allowed multiple subscriptions to connect to the same one.

One of them request a batch (so we send it and skip the range for the other subscriptions). It fails, so we need to go back and send it to someone else. that means that we need to maintain a LOT more state.

In addition, there is also the complexity of concurrent processing. Imagine that you have a document users/1 that was sent to a subscription client for processing. During that time, it has changed, and another subscription wants a batch.

What do we do?

One option is to remember that this document id is on an outstanding batch and skip it (but then you have to send it again later). This leads to even more state. Another is to send it to the subscription client, but then you might have an action taken by the second client based on updated information, but the first client comes in and based on out of date info it does something wrong.

And we haven't talked yet about how you expect to failover in a cluster if you have multiple subscriptions clients that might be talking to different nodes entirely.

In short, this is quite complex. It is actually much easier to have a single subscription and then have that feed into multiple threads / services.

13 Apr 2018
11:41 AM

Oren Eini

Padgett, We cannot run inside iOS or Android at this time. However, what we can do is have integration with PouchDB, so you use Pouch locally and then send it to Raven. See: https://github.com/arielgamrian/PouchToRavenDriver

13 Apr 2018
13:50 PM

Dylan Bevan

Oren, A possible solution I was thinking of was that when creating a subscription you choose a partitioning field, and how many partitions you would like. This maps 1:1 with how DocumentDB does it as they HAVE to know about partitions and this is a limiting feature which thankfully RavenDB doesn't have. Now lets call the subscription a aggregate subscription, which is composed of 1..n subscriptions where n is the number of partitions chosen previously. When a document is inserted a partition value could be generated based on the partitioning field selected and the value stored in the document metadata.

Along comes a client who wants to subscribe to the aggregate, RavenDB knows how many different partition keys there are for this subscription and so informs the client that it should set up a subscriber for each partition. Instead of the client now having a single subscriber, it has n. Each batch is a batch for the partition only. This is actually important because IF I choose a field to partition on such as UserID then I can expect that most of the time changes for documents relating to that user will be processed by the same client node. When it connects to the aggregate the client supplies how long a lease should last, and some unique identifier (process not machine specific). The client will attempt to renew the subscription for each partition at a specified time interval which should be less than the lease time.

Client number two gets started and wants to subscribe to the aggregate, RavenDB sees that all partitions on the aggregate have a active subscribers but they're all the same client. Raven rejects the subscription request and client number 2 will retry again in the next renew period. After some time client one tries to renew its leases, Raven is aware that client two is waiting and so rejects the subscription for 50% of the subscriptions. Some time later client two attempts to get leases again and this time Raven is able to assign some partitions to it. Alternatively depending on how quickly Raven is able to terminate a client subscription, it could notify client one immediately upon client requesting a lease that it's going to terminate partition x. At that point sends its last batch so the stream checkpoint is up to date and then switches the connection over to client two.

Of course clients can go up and down all the time in a production environment so by using this strategy the user has to accept that there will be some delay initially when load balancing and whenever clients come up or go down.

13 Apr 2018
16:18 PM

Bob Lamb

I'd like to see a bit more features in the CSV imported, it's so useful if you want to move data between environments , features such as

Only export selected items rather than whole collections.
As above on import.
Some control over the id generation method.

15 Apr 2018
06:14 AM

Oren Eini

Dylan, I'm not sure why the lease idea is a good one. We already have a way to detect this, the underlying TCP connection is the "lease" in this case. And the method you are describing is something that will require centralizing all the subscriptions handling for a single node (or distributed consensus for each such call). That is _complex_.

This is also pretty much the same as having N subscriptions in the first place. You can do something like this:

declare function crc32(str) {
    var a_table = "00000000... REDACED ...2D02EF8D";
    var crc_table = a_table.split(' ').map(function(s){ return parseInt(s,16) });

    var crc = -1;
    for(var i=0, iTop=str.length; i<iTop; i++) {
        crc = ( crc >>> 8 ) ^ crc_table[( crc ^ str.charCodeAt( i ) ) & 0xFF];
    }
    return (crc ^ (-1)) >>> 0;
}
from Employees as e
where crc32(id(e)) % 5 == 1

The idea is that you hash (poorly, but still) the id of the document, then you have 5 subscriptions and you can register to one of them. Each one of them is handled independently, can run on a different server and doesn't require coordination.

15 Apr 2018
06:16 AM

Oren Eini

Bob Lamb, 1) That is already possible right now. You can export to CSV the report of a query, see the query page, there is an Export to CSV there. 2) Also possible, you can define a transformation script (see the Advanced section in the import page), which can also filter out the undesirable docs. 3) Can you expand on that?

15 Apr 2018
06:23 AM

Dylan Bevan

Oren, Exactly, I'm suggesting that Raven does that for me. As a consumer of data I want to be able to distribute it across n consumers because I'm concerned that a single one will not be able to keep up with the rate of data on the feed. My suggestion is that you allow me to say I want it distributed by x consumers on field z, and RavenDB under the hood manages breaking that down in to a number of subscriptions. When I say I want to read a subscription the Raven client takes care of spinning up n threads with one per subscription which can be load balances out across multiple nodes that may come online later. Please take a look here as it'll do a better job explaining it than I'm doing https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed#using-the-change-feed-processor-library

15 Apr 2018
06:32 AM

Oren Eini

Dylan, That is something quite different. Distributing the load between client is not the same as having multiple subscriptions. You need to have a separate coordination step between the clients to decide how many subscriptions each will consume.

15 Apr 2018
06:33 AM

Oren Eini

Dylan, Also note that we tested a subscription to send over 100K / second, so ti is actually not likely to be the limiting factor from your point of view.

15 Apr 2018
15:14 PM

Dylan Bevan

Oren, I think wires have crossed. So to go back to the title of this blog post: Problem: My subscription reader cannot keep pace with the amount of events coming in.

Requirement: Create multiple readers to distribute processing of data. Note that in a stable topology I expect that messages with the same keying factor will go to the same processor. This allows me to expect some order guarantee.

Clarifications: * I don't 'care' about the underlying implementation. I was suggesting that having multiple subscriptions that appeared as one to the client would be less work on your side. * The speed of RavenDb at sending events is not the problem. Imagine data ingress at 70K / second (below your 100K output), but the processing I'm doing per subscription batch can only reach 25K / second. If I could have three subscription readers I could in theory achieve 75K / second and be able to keep up with ingress

16 Apr 2018
12:52 PM

Oren Eini

Dylan, http://issues.hibernatingrhinos.com/issue/RavenDB-10945

16 Apr 2018
13:18 PM

Padgett

PouchDB looks interesting, especially for hybrid apps.

If you're taking feature requests for vNext, NATIVE support for a local Raven database on iOS and Android would get a +1 from me. (No reply required)

18 Apr 2018
05:36 AM

Pure Krome

Hi HR,

Played with it in small doses:

Things i've liked so far:

.NET Core
.NET Core
.NET Core
Studio (very nice UX refresh)
Better perf (i've yet to do any mass BULK INSERTS, yet)
Stronger security (dropping api keys and using certs)
Doing an external security audit AND publishing that audit including talking about various audit points. NOTE: I'm not a smart cookie so a lot went over my head, but it's great to read and try and learn.
Pricing. Finally a price structure to compete against others! Having a free tier that is actually workable in a small-but-live startup is soooooo nice to have. This will get more people using this and then migrating up to payed versions as their products grow, IMO. Also gives people a more realistic chance to test live stuff.
Playground -> test stuff out without having to install anything.

Things I've found frustrating:

Installation didn't work flawlessly (I created an issue a while back about it, re: Certs)
All this stuff with certificates. It feels simple, until I have to install em or use them or get my first LetsEncrypt to renew (which I don't think it did?). Also, having to export certs and add them to the browser or something? So confusing for my small brain :( I feel like more onboarding/teaching could be required?
The new default identity. Having the NODE as part of the key felt .. weird. Also, I got confused at first, thinking this was the CURRENT node the doc was retrieved from, not the node where it was created on. I know I can customise it. I also have mostly just one-node setups, so all my docs will end in -A ... but yes .. i do have the option to have multiple nodes later. Not hating - just found it a bit weird and confusing at first and I read the reason for the change, etc.
Some of the way we test stuff has changed. The previous, older way felt waaay easier to get tests up and running. Again, this might just be a case of re-learning the new shiny things.

What I'm hoping (but not yet tested):

Easier ways to manage the server. More stats, metrics, etc. exposed in a nice way via the Studio etc.
bulk inserting doesn't assplode things.
Docker. First I need to learn Docker and then update our localhost environments to include Dockerized RavenDb microservices, where applicable. And later on, possibly dockerize our live stuff.
More/ another Cloud hosted RavenDb in Azure .. that's also version 4. And that's also not 10000000$$$$$. I do hate dealing with hardware (VM's) and updating versions of software. Maybe a mute point if/when we dockerize-all-the-things?

18 Apr 2018
06:41 AM

Oren Eini

Pure Krome,

Regarding installation, it should be even simpler now. Regarding certs, I agree that the install this globally on your machine isn't ideal, but once this is setup (and in 96% of cases it just works), this is a purely painless way to do strong authentication. Renewal vie Let's Encrypt should just work (it wasn't implemented in the RC, though). I wish we could do something different about the document id, but adding the creating node id (which isn't actually what it is doing, it is the node id that generated the id, which isn't necessarily the same as the node the document was created on) is the simplest and most obvious way to do this. Anything else has edge cases that are non obvious that we wanted to avoid. There are options to remove this if you really want to, but we explicitly made it take a bit of work because we run into problems with users not using this feature in 3.5 (where it existed as an optional Raven/ServerPerfixForHilo) and then running into complex problems down the road.

For server stats, we expose a LOT of information. From the server dashboard that gives you the 30,000 feet view to individual writes to disk and ongoing monitoring of exactly what is going on around the system. Please take a look, I would really appreciate any feedback you have on this matter.

Can you explain the issue with bulk insert?

I can tell you that in terms of the cloud, there are good things coming soon for 4.0

18 Apr 2018
07:00 AM

Pure Krome

Can you explain the issue with bulk insert?

It's been a while, but it usually had weird issues with failing to insert all the items (crashed during it) which I think was IIS timeouts or something. Has been a while. Then there was also indexing issues, but the indexing has had heaps of love since I last did some bulk inserts. So i'm assuming it's waaay better, now.

Regarding certs

For me, it was a huge learning curve. A lot of guessing too. I'm not hating the system. I get it. It's just I think maybe there needs to be more discussion/examples/docs on this? maybe?

20 Apr 2018
20:14 PM

M. Schopman

Graph database features would be a welcome addition, especially for role based permission lookups where you want to avoid pre-caching or pre-calculating all permissions based on a specific inheritance chain.

22 Apr 2018
07:17 AM

Oren Eini

Pure Krome,

There shouldn't be any issues with bulk insert. We explicitly don't run on IIS and we are much better with this regard. We have tested this by literally inserting 5 billion (with a B) documents, it works. And indexing are much nicer in 4.0

22 Apr 2018
07:19 AM

Oren Eini

M. Schopman, Yes, that is on the horizon, but will probably take 6 months or so

23 Apr 2018
19:15 PM

Kus

Server side sharding mechanism with zones based strategy. Any plans on this feature?

23 Apr 2018
19:40 PM

Oren Eini

Kus, Yes, that is planned, see here for details: http://issues.hibernatingrhinos.com/issue/RavenDB-8115

What do you mean by zones?

The key problem from our perspective is that we want to still allow you to be ACID and use sharding, that is hard and we haven't found a design that we are happy with.

23 Apr 2018
20:08 PM

Kus

Actually what I would like to achieve in the end is to be able to provide custom logic for strategies.

In this case zones where for example Europe or America. Scenario is part of users are from one zone part from another and they rarely use data from another one.

23 Apr 2018
22:48 PM

Oren Eini

Kus, Yes, that is something that we want to build, but it all comes down to how we still keep transactions with sharding.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB