The roadmap for 2018
The year 2018 just rolled by, and now it the time to talk about what we want to do in this year. The release of the 4.0 version is going to be just the start, to be honest.
In no particular order, I want the following things to happen in the near future:
- Finishing the book (github.com/ravendb/book). I currently have more than 300 pages in it, and I’m afraid that I’m only 2/3 of the way, if that. RavenDB has gotten big and doing justice to everything it does take a lot of time. My wish list here is that I’ll finish writing all the content by the first quarter and have it out (as in, you can have it on your desk) by the second quarter. Note that you can read it right now, and the feedback would be very welcome.
- All the client APIs RTM’ed. We currently have clients for .NET, Python, JVM, Go, Ruby and Node.JS. Some of them are already ready for production, some are at RC level and some are still beta quality. We’ll dedicate a some effort and release all of these in the first quarter as well. I think that alongside with being able to run on multiple operating systems, we want to give people the choice of using RavenDB from multiple platforms and having a client for a particular platform is the first step on that road.
- Getting (and incorporating) users’ feedback. We have worked closely with several of our customers on the release of 4.0, and we have got people chomping at the bit to just get it out (who wants to say no to being 10 – 50 times faster). But RavenDB 4.0 is a huge undertaking, and there are going to be things that we missed. The feedback from the RC releases has been invaluable in finding scenarios and conditions that we didn’t consider. I’ve explicitly put aside time to handle that sort of feedback as people are rolling out RavenDB and need to smooth any rough corners that still remain.
These are all the near term plans, for the next few months. These mostly deal with actually dealing with the aftermath of a big release, with nothing major planned for the near future because I expect all of us to be dealing with all the other things that you need to do with a big release.
The last year had seen us grow by over 40% in terms of manpower and the flexibility of having some many great people working here which can push the product in so many directions at once is intoxicating. I have been dealing with a lot of retrospectives recently as we have been completing RavenDB 4.0 and it amazed me just how much was accomplished and how many irons we still have in the fire. So let’s talk about the big plans for 2018, shall we?
Additional storage types
In 4.0, we have JSON documents and binary attachments that you can add to a document. One of our goals in 2018 is to add two or three additional options, turning RavenDB from a document database to a true multi paradigm database. In particular, we want to add:
- Distributed counters
- Time series
- Graph operations
These are all going to be living together with documents, so you have have a user’s document with a FitBit and a heartrate time series on that document that updates every 5 seconds. Or you can have a post document in a blog and use a counter to track how many likes it has gotten. And I don’t believe that I need to explain about graph operations. We want to allow you to define connections between documents and query them directly.
The idea here is that we got documents, but they aren’t always the best tool for the job, so we want to offer you the option to do that in a way that is optimized, fast, easy and convenient to use.
Better integration
We already have done a lot of work around working with additional services and environment, we just need to polish and expose that. This means things like being able to get a PouchDB instance that is running in your browser and have it sync automatically and securely to your RavenDB cluster. Or being able to point RavenDB into an instance of a relational database and have it such all the data, build the document model and save your a lot of work on migrating to a document database.
Ever faster
Performance is addictive, and it has caught us. We are now orders of magnitude faster than ever before. We have actually been scaling down our production servers intentionally to be able to see if we can find more bottlenecks in the real world, so far we went down CPU by half and memory to one quarter and we are still seeing faster response times and better latencies. That said, we can do better, and we are planning to.
I’m looking forward to are things like Span<T> and Memory<T> which would really reduce overheads in some key scenarios. We are also eagerly awaiting the arrival of SIMD intrinsics in the CoreCLR and already have some code paths that are going to be heavily optimized as a result. Early results show something like 20% – 40% improvement, but we’ll probably be able to get more over time. One of the reasons I’m so excited about the release is that people get to actually use the software and see how much it improved, but also give us feedback on the things that can be made even faster.
Until we have actual people using us in production over a long period of time, it is hard to avoid optimizing in the dark, and that never gives a good ROI.
Community
Everything before was mostly technical things. Features that are upcoming and new things that you get to do with RavenDB. We are also going to invest heavily in getting the word out, showing up at conferences and users’ group. We are also scheduling a lot of workshops around the globe to teach RavenDB 4.0. The first round is already available here.
There is also the new community license for RavenDB, allowing you to go to production without needing to purchase a commercial license. This should reduce the barrier for adoption and we hope to see a lot of new users starting to come to RavenDB. We are now free to use, running on multiple operating systems and available in the most commonly used platform. And we are easy to get right, although it was anything but easy to get there. A good example of that is the setup video.
All in all, 2017 has been a major year for us, both in term of growth on any parameter we track and the culmination of years of efforts that lead us to the release of RavenDB and seeing the new version take its first steps on the first days of 2018.
Happy new year, everyone.
Comments
Happy new year!
I guess this is what you have hinted at in this answer:
This will be really useful to quickly setup a demo for simple querying with data at production scale without the need to do proper modelling upfront and not having do handcraft the import for hundreds of tables/entities. Being able to show something makes all the difference. I'm eagerly awaiting this feature.
Felix, Yes, and that is the general idea, we want to make things very easy
Amazing, tnx for the update. Just 2 things: - PERF: what about “in” params? i’m curious if that would boost perf too, mainly by reducing allocations (i’m thinking mostly strings)? - LUCENE: do you plan on start working on a Lucene replacement in 2018, too? If i’m not wrong you mentioned a couple of times that is something you were looking at in the past (like you did switching from Esent to Voron, for example)?
Njy, What do you mean, "in params"?
Lucene is absolutely something that we want to look at and see if we can switch from, yes. That is speculative, though.
njy meant C# 7.2
in
parameters.I'm looking forward to seeing other things supported. I'm doing a trial run of migrating a fairly clean relational database. One table really makes use of being able to have different kinds of documents, but we also have plenty of small "vote/counter" tables, and they look more circumspect to support as cheaply in Raven as in a relational database. Having them as time series events and having the counter functionality separately will probably help greatly. And I'm sure graph operations will help with tag membership.
Do you have any modeling advice for modeling the inclusion of a post in separate series, where the inclusion also has attributes? Think of a pile of documents or recordings forming a book. The date of ingest into the system will not necessarily correspond with the order, and "prologue" and "chapter 7, part 2" are valid descriptors of each item, so they're not even necessarily the same granularity. Right now, I have each series as essentially a tag, and a list of a struct containing a tag ID and the order. There are many fair acknowledgements in the documentation that you shouldn't just try to translate your relational models, but it would be really helpful to have some sort of repository somewhere with representative examples of models.
@Oren: exactly what Jesper pointed at
Jesper, I don't see what
in
params give me thatref
params don't, in terms of performance.Does each individual value you have there is truly independent? A good way to think about it might be family. There is the family unit, and then there are the family members. For some, inclusion in the family is when they are born (children), for some, it is much after (marriage). But being in a family doesn't prevent you from being an individual, and you might be in multiple families (divorce kids). For the different granularity, imagine things like pets, step grandparents, etc.
I would model here the concept of an individual and family separately.
@Oren: in the case of RavenDB's internal codebase you're probably right. In a more general way, I saw very few people actually using ref params for values/strings, probably because that seems weird or too "strange" (think lib/fw api surface). But now, having a dedicated keyword to clearly express the intent, i think we may see a lot of use for it.
In fact, looking at the source code for RavenDB - even just the client - i can see that for example almost every method using string params, even if they do not mutate those params internally, are not marked with ref, whereas now they could be marked with in and automatically reduce the number of allocations.
Does this make sense?
In terms of performance, apparently knowing something will never be changed (which is always a possibility with
ref
, but whichin
rules out, saying this is only passed by reference for performance reasons) means the JIT and the optimizer can do stuff it wouldn't dare do otherwise, and usingin
instead ofref
means it's a compile time error to mistakenly change them.The series and the posts are both individual entities - each series are a collection, and will also have some information about itself. So both of them will have pages in a web site, and it will be necessary to fetch all the info for both of them. That's why, to me, it seems like having the membership information on any end (in the post, or in one collection in the series) just shortchanges the loading for the other end. Which is fine, it's not a relational database, but I wish there was a better way to decide which to do. I think it may come down to doing it both ways and measuring which is more efficient, and hope the answer to that doesn't shift as Raven evolves.
njy: string is a reference type and not a value type, so I don't think
in
params works for string parameters.njy, In terms of performance,
ref
orin
are only relevant for value objects. For reference types, such asstring
, this is actually matter if theref
will modify thereference
, not the value.ref
for value types in mostly useful when you have a large structure, and that can save copying it.Jesper, I would probably model things so each of the entries have a reference to the series that they are a part of, then either include or query for the all. Otherwise, you can create a relation document explicitly, but that is reserved for when the relationship is meaningful. For example, in a family setting, you have each individual and the family document and the marriage certificate.
Oren and Jesper yes they are reference types, what I meant was mutating the reference, not the pointed content itself, I've expressed myself badly. But that wouldn't reduce allocations, you're right.
njy, Strings in particular are immutable by nature, so you generally aren't mutating them anyway. For most other things, you will mutate the reference (such as adding an item to a list), so that is non issue.
I think this is pretty much only for structs.
Any thoughts on offering an Entity Framework Core provider as an alternate Client option for those used to EF Core client API? Any thoughts on offering the Client under something not related to xGPL? People get anxious when having to pull xGPL dependencies for proprietary systems. MIT for example is straightforward.
John, There is no real point in have an EF Core client, I think. Our client give comparable or better experience, and is well suited for the task. And the clients are already MIT :-)
Oren, thanks for the MIT, kinda missed that news!
For EF Core, if I may "insist", a reason to consider it would be interoperability. EF promises db agnostic coding which is at par with what Hibernate is offering for the Java folks and what the industry prefers nowadays.
Also, with an EF Core provider for DocumentDB in the works for 2.1, it is also a matter of what the competition is offering.
I would love to hear your thoughts on this, even better if it qualifies for a blog post!
John, I have been working with various kinds of OR/M for over 15 years now. I have literally never seen the promise of "db agnostic" work once in anything but the most trivial of applications. The problem is that you have to cater to the lowest common denominator, and that kills any useful features.
This (and similar) and good starting place: https://ayende.com/blog/3955/repository-is-the-new-singleton
Comment preview