Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660Rafal commented on Index Prioritization in RavenDB: ProblemsYeah, it's quite unfortunate that Json objects need to be represented in two distinct forms - a serialized block of bytes in the storage and parsed in-memory document representation with all mem overhead, fragmentation and caching/sharing problems. http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment19http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment19Tue, 26 Feb 2013 10:05:10 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsRafal, Sure, Windows has the idea of shared memory. But it is non trivial to handle that properly. Especially if you want to share that to multiple threads. And you can't really share _.NET_ objects properly.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment18http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment18Tue, 26 Feb 2013 08:35:02 GMTGregory Long commented on Index Prioritization in RavenDB: ProblemsI guess I explained it poorly because it looks to me that Jason and I made the same suggestion. (great minds thinking alike and such) You liked his suggestion but had issues with mine . . . need to work on my language skills.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment17http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment17Mon, 25 Feb 2013 20:27:59 GMTRafal commented on Index Prioritization in RavenDB: Problems... but after giving it a second thought, I don't think this would work well enough, there is too much concurrent writing going on and it needs to be managed quite carefully http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment16http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment16Mon, 25 Feb 2013 20:03:08 GMTRafal commented on Index Prioritization in RavenDB: Problems@Ayende I'm talking about letting the OS manage memory and you are afraid that you would have to manage all that memory yourself - this is quite opposite from what I was trying to say. Unix systems are quite good at sharing memory pages between processes, so maybe Windows can do the same - then you would avoid the penalty of duplicated cache and I/O. And besides, its much easier to terminate a worker process and return all resources to the OS than trying to garbage-collect unused CLR objects from a running application.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment15http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment15Mon, 25 Feb 2013 19:17:20 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsRohland, We actually came up with something similar to that idea, keep an eye on the queue, it will show up in a few days.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment14http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment14Mon, 25 Feb 2013 18:54:23 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsGregory, Sure, that is what you would want to do, but then you get into situations about who is the master, what about conflicts, data aliasing issues, etc.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment13http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment13Mon, 25 Feb 2013 18:53:35 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsJason, Pretty much, yes. That is the current recommendation.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment12http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment12Mon, 25 Feb 2013 18:52:58 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsRafal, How would using multiple processes help any? Sure, I could let the OS handle scheduling, but it doesn't really change anything in terms of the other constraints that I've got. It _really_ complicates memory management, because now I have to deal with multiple processes and manage all of them (remember, if we start paging, perf drop like a rock), so we can't let the OS do that. It doesn't help for I/O, since I would either need to duplicate the I/O or share it among multiple processes, which is a decidedly non trivial task.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment11http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment11Mon, 25 Feb 2013 18:51:37 GMTRohland commented on Index Prioritization in RavenDB: ProblemsWhat about injecting some logic at write time that defers indexing of a low priority index when a new document is added? I'm not familiar with the internals, but this would be similar to pretending the document didn't exist (for specific indexes) until a later point when either the system has capacity or a dynamically expired period expires (it couldn't be static as you'd run into the same problem, just a but later). At this point you simulate the write with metadata indicating which indexes are already aware of the changes. For this "write", the indexing subsystem executes skipping the high priority, already updated indexes. As I said I'm not sure this approach makes sense given the optimizations internal to RavenDB. It just allows one to throttle the indexing events for writes without (I'm guessing) fiddling too much with internal optimisation.http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment10http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment10Mon, 25 Feb 2013 18:29:41 GMTGregory Long commented on Index Prioritization in RavenDB: ProblemsForgive me for being simple about this but if the question is how to support priority indexing for a small number of very important indexes without impacting other behavior in the index then the answer seems simple to me - move the high priority index/queries/data to their own instances of Raven. If there is that much activity and the consistency of the index is that important (i.e. worth spending $$$) than why not give features that are so important their own db? If that is too expensive then maybe the consistency of the index isn't as important as thought, or maybe the data doesn't need to be written as often, or a bulk operation done less frequently would suffice? Granted, it isn't as sexy as solving the race condition but . . .http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment9http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment9Mon, 25 Feb 2013 13:33:00 GMTJason Meckley commented on Index Prioritization in RavenDB: ProblemsIsn't this similar to the idea of priority queues on a service bus? My understanding was that is a generally not a good idea. Instead of prioritizing you could setup a dedicated service bus for the high priority workflows. could the same approach be used for indexing? replicate data to another instance and have that instance manage specific index(es). Priority is then implicitly defined because the ravendb instance is only responsible for that index (or set of indexes).http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment8http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment8Mon, 25 Feb 2013 13:28:37 GMTRafal commented on Index Prioritization in RavenDB: ProblemsInteresting issue. Looks like you are quite often trying to do operating system's job in scheduling, caching, I/O and memory management. Maybe it would be a good idea to let the OS handle these issues? For example, you could make indexing an external process (one process per index) that would run in its own pace. The db files are probably already cached by the OS so this wouldn't hurt the IO performance very much. http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment7http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment7Mon, 25 Feb 2013 13:06:30 GMTKhalid Abuhakmeh commented on Index Prioritization in RavenDB: ProblemsThose are tough questions. - Different speed / same priority Is this much different then how indexes work today? All the indexes now are the same priority, but more often than not, one index finishes before the other (unless the studio is lying to me). - Parallel : Yes you would run them in parallel, because each priority would have a separate set of documents. So they could run in parallel without influencing each of the other sets. IO and Memory is another issue. - Low memory conditions This is probably a implementation decision, if you wanted to drop all the documents from the low index queue and reload them into memory later you could do that. You could also have Low priority indexes limited by IO. Meaning, you read them from disk sequentially and put the pressure on CPU and IO (I don't like that as I type it). http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment5http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment5Fri, 08 Feb 2013 17:00:10 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsKhalid, What happens when you have 3 low indexes, running at different speed? Could you /should you run high & low in parallel? What happens under low mem conditions?http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment4http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment4Fri, 08 Feb 2013 16:47:38 GMTKhalid Abuhakmeh commented on Index Prioritization in RavenDB: ProblemsSo what if you had three lanes of documents in memory. High, Normal, Low. You would duplicate the document into each lane, and let the indexes consume those documents in that particular lane. Low might remain in memory for a bit, but you could clear out normal and high as you process them assuming they process as quickly as they do now or even more quickly. At most you would peak at 3 times as much memory as you use now. What do you think? http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment3http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment3Fri, 08 Feb 2013 16:45:05 GMTAyende Rahien commented on Index Prioritization in RavenDB: ProblemsKhalid, LOT more memory means that we don't know when to release memory. If you have three low prio indexes, all of them indexing at different rates, how do you tell when to release memory held for them. Let us assume that they are 20K, 40K, 60K documents behind the std indexes. That means that we have to keep 60K documents in memory (and remember that we still get new writes) to allow for in memory indexing without additional IO. It also means that those indexes are going to get bigger and bigger batches, which impact CPU & IO as well. http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment2http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment2Fri, 08 Feb 2013 16:37:46 GMTKhalid Abuhakmeh commented on Index Prioritization in RavenDB: ProblemsThe issue of starvation makes a lot of sense to me. I don't want a low priority index to get neglected forever. When you say a LOT more memory, what are we talking about here? I think people would take the trade off if they understood the risks and were willing to pay for the memory. I've seen people say they have 20 gigs of memory or higher on some of their ravendb servers. It definitely is a hard problem, but if it was easy then everyone would be doing it :).http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment1http://ayende.com/161123/index-prioritization-in-ravendb-problems#comment1Fri, 08 Feb 2013 16:32:52 GMT