Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 270 words

I found myself today needing to upload a file to S3, the upload size is a few hundred GBs in size. I expected the appropriate command, like so:

aws s3api put-object --bucket twitter-2020-rvn-dump --key mydb.backup --body ./mydb.backup

But then I realized that this is uploading a few hundred GB file to S3, which may take a while. The command doesn’t have any progress information, so I had no way to figure out where it is at.

I decided to see what I can poke around to find, first, I ran this command:

ps -aux | grep s3api

This gave me the PID of the upload process in question.

Then I checked the file descriptors for this process, like so:

$ ls -alh /proc/84957/fd


total 0
dr-x------ 2 ubuntu ubuntu  0 Mar 30 08:10 .
dr-xr-xr-x 9 ubuntu ubuntu  0 Mar 30 08:00 ..
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 0 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 1 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 2 -> /dev/pts/8
lr-x------ 1 ubuntu ubuntu 64 Mar 30 08:10 3 -> /backups/mydb.backup

As you can see, we can tell that file descriptor#3 is the one that we care about, then we can ask for more details:

$ cat /proc/84957/fdinfo/3
pos: 140551127040 flags: 02400000 mnt_id: 96 ino: 57409538

In other words, the process is currently at ~130GB of the file or there about.

It’s not ideal, but it does give me some idea about where we are at. It is a nice demonstration of the ability to poke into the insides of a running system to figure out what is going on.

time to read 1 min | 179 words

RavenDB has the public live test instance, and we have recently upgraded that to version 6.0.  That means that you can start playing around with RavenDB 6.0 directly, including giving us feedback on any issues that you find.

Of particular interest, of course, is the sharding feature, it is right here:

image

And once enabled, you can see things in more details:

image

If we did things properly, the only thing you’ll notice that indicates that you are running in sharded mode is:

image

Take a look, and let us know what you think.

As a reminder, at the top right of the page, there is the submit feedback option:

image

Use it, we are waiting for your insights.

time to read 2 min | 259 words

Let’s say that you have the following scenario, you have an object in your hands that is similar to this one:

It holds some unmanaged resources, so you have to dispose it.

However, this is used in the following manner:

What is the problem? This object may be used concurrently. In the past, the frame was never updated, so it was safe to read from it from multiple threads. Now there is a need to update the frame, but that is a problem. Even though only a single thread can update the frame, there may be other threads that hold a reference to it. That is a huge risk, since they’ll access freed memory. At best, we’ll have a crash, more likely it will be a security issue. At this point in time, we cannot modify all the calling sites without incurring a huge cost. The Frame class is coming from a third party and cannot be changed, so what can we do? Not disposing the frame would lead to a memory leak, after all.

Here is a nice trick to add a finalizer to a third party class. Here is how the code works:

The ConditionalWeakTable associates the lifetime of the disposer with the frame, so only where there are no more outstanding references to the frame (guaranteed by the GC), the finalizer will be called and the memory will be freed.

It’s not the best option, but it is a great one if you want to make minimal modifications to the code and get the right behavior out of it.

time to read 2 min | 262 words

Trevor Hunter from Kobo Rakuten is going to be speaking about Kobo’s usage of RavenDB in a webinar next Wednesday.

When I started at Kobo, we needed to look beyond the relational and into document databases. Our initial technology choice didn't work out for us in terms of reliability, performance, or flexibility, so we looked for something new and set on a journey of discovery, exploratory testing, and having fun pushing contender technologies to their limits (and breaking them!). In this talk, you'll hear about our challenges, how we evaluated the options, and our experience since widely adopting RavenDB. You'll learn about how we use it, areas we're still a bit wary of, and features we're eager to make more use of. I'll also dive into the key aspects of development - from how it affects our unit testing to the need for a "modern DBA" role on a development team.

About the speaker: Trevor Hunter: "I am a leader and coach with a knack for technology. I’m a Chief Technology Officer, a mountain biker, a husband, and a Dad. My curiosity to understand how things work and my desire to use that understanding to help others are the things I hope my kids inherit from me. I am currently the Chief Technology Officer of Rakuten Kobo. Here I lead the Research & Development organization where our mission is to deliver the best devices and the best services for our readers. We innovate, create partnerships, and deliver software, hardware, and services to millions of users worldwide."

You can register to the webinar here.

time to read 2 min | 246 words

RavenDB Sharding is now running as a production replication in our backend systems and we are stepping up our testing in a real-world environment.

We are now also publishing nightly builds of RavenDB 6.0, including Sharding support.

image

There are some known (minor) issues in the Studio, which we are busy fixing, but it is already possible to create and manage sharded clusters. As usual, we would love your feedback.

Here are some of the new features that I’m excited about, you can see that we have an obese bucket here, far larger than all other items:

image

And we can drill down and find why:

image

We have quite a few revisions of a pretty big document, and they are all obviously under a single bucket.

You can now also get query timings information across the entire sharded cluster, like so:

image

And as you can see, this will give you a detailed view of exactly where the costs are.

You can get the nightly build and start testing it right now. We would love to hear from you, especially as you test the newest features.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}