Ayende @ Rahien

Mar 31 2023

Tricks of the trade: Figuring out progress of a large upload

time to read 2 min | 270 words

Tags:

I found myself today needing to upload a file to S3, the upload size is a few hundred GBs in size. I expected the appropriate command, like so:

aws s3api put-object --bucket twitter-2020-rvn-dump --key mydb.backup --body ./mydb.backup

But then I realized that this is uploading a few hundred GB file to S3, which may take a while. The command doesn’t have any progress information, so I had no way to figure out where it is at.

I decided to see what I can poke around to find, first, I ran this command:

ps -aux | grep s3api

This gave me the PID of the upload process in question.

Then I checked the file descriptors for this process, like so:

$ ls -alh /proc/84957/fd
total 0
dr-x------ 2 ubuntu ubuntu 0 Mar 30 08:10 .
dr-xr-xr-x 9 ubuntu ubuntu 0 Mar 30 08:00 ..
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 0 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 1 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 2 -> /dev/pts/8
lr-x------ 1 ubuntu ubuntu 64 Mar 30 08:10 3 -> /backups/mydb.backup

As you can see, we can tell that file descriptor#3 is the one that we care about, then we can ask for more details:

$ cat /proc/84957/fdinfo/3

pos:    140551127040
flags:  02400000
mnt_id: 96
ino:    57409538

In other words, the process is currently at ~130GB of the file or there about.

It’s not ideal, but it does give me some idea about where we are at. It is a nice demonstration of the ability to poke into the insides of a running system to figure out what is going on.

Mar 21 2023

RavenDB 6.0 live instance is now up & running: Come test it out!

time to read 1 min | 175 words

6 comments

Tags:

RavenDB has the public live test instance, and we have recently upgraded that to version 6.0. That means that you can start playing around with RavenDB 6.0 directly, including giving us feedback on any issues that you find.

Of particular interest, of course, is the sharding feature, it is right here:

And once enabled, you can see things in more details:

If we did things properly, the only thing you’ll notice that indicates that you are running in sharded mode is:

Take a look, and let us know what you think.

As a reminder, at the top right of the page, there is the submit feedback option:

Use it, we are waiting for your insights.

Mar 13 2023

ExternalFinalizer: Adding a finalizer to 3rd party objects

time to read 2 min | 259 words

5 comments

Tags:

Let’s say that you have the following scenario, you have an object in your hands that is similar to this one:

It holds some unmanaged resources, so you have to dispose it.

However, this is used in the following manner:

What is the problem? This object may be used concurrently. In the past, the frame was never updated, so it was safe to read from it from multiple threads. Now there is a need to update the frame, but that is a problem. Even though only a single thread can update the frame, there may be other threads that hold a reference to it. That is a huge risk, since they’ll access freed memory. At best, we’ll have a crash, more likely it will be a security issue. At this point in time, we cannot modify all the calling sites without incurring a huge cost. The Frame class is coming from a third party and cannot be changed, so what can we do? Not disposing the frame would lead to a memory leak, after all.

Here is a nice trick to add a finalizer to a third party class. Here is how the code works:

The ConditionalWeakTable associates the lifetime of the disposer with the frame, so only where there are no more outstanding references to the frame (guaranteed by the GC), the finalizer will be called and the memory will be freed.

It’s not the best option, but it is a great one if you want to make minimal modifications to the code and get the right behavior out of it.

Mar 10 2023

Next week: Kobo's Journey Into High Performance and Reliable Document Databases at InfoQ

time to read 2 min | 262 words

0 comments

Tags:

Trevor Hunter from Kobo Rakuten is going to be speaking about Kobo’s usage of RavenDB in a webinar next Wednesday.

When I started at Kobo, we needed to look beyond the relational and into document databases. Our initial technology choice didn't work out for us in terms of reliability, performance, or flexibility, so we looked for something new and set on a journey of discovery, exploratory testing, and having fun pushing contender technologies to their limits (and breaking them!). In this talk, you'll hear about our challenges, how we evaluated the options, and our experience since widely adopting RavenDB. You'll learn about how we use it, areas we're still a bit wary of, and features we're eager to make more use of. I'll also dive into the key aspects of development - from how it affects our unit testing to the need for a "modern DBA" role on a development team.

About the speaker: Trevor Hunter: "I am a leader and coach with a knack for technology. I’m a Chief Technology Officer, a mountain biker, a husband, and a Dad. My curiosity to understand how things work and my desire to use that understanding to help others are the things I hope my kids inherit from me. I am currently the Chief Technology Officer of Rakuten Kobo. Here I lead the Research & Development organization where our mission is to deliver the best devices and the best services for our readers. We innovate, create partnerships, and deliver software, hardware, and services to millions of users worldwide."

You can register to the webinar here.

Mar 02 2023

RavenDB Sharding Progress

time to read 2 min | 242 words

0 comments

Tags:

RavenDB Sharding is now running as a production replication in our backend systems and we are stepping up our testing in a real-world environment.

We are now also publishing nightly builds of RavenDB 6.0, including Sharding support.

There are some known (minor) issues in the Studio, which we are busy fixing, but it is already possible to create and manage sharded clusters. As usual, we would love your feedback.

Here are some of the new features that I’m excited about, you can see that we have an obese bucket here, far larger than all other items:

And we can drill down and find why:

We have quite a few revisions of a pretty big document, and they are all obviously under a single bucket.

You can now also get query timings information across the entire sharded cluster, like so:

And as you can see, this will give you a detailed view of exactly where the costs are.

You can get the nightly build and start testing it right now. We would love to hear from you, especially as you test the newest features.

Oren Eini

Oren Eini

CEO of RavenDB

Tricks of the trade: Figuring out progress of a large upload

RavenDB 6.0 live instance is now up & running: Come test it out!

ExternalFinalizer: Adding a finalizer to 3rd party objects

Next week: Kobo's Journey Into High Performance and Reliable Document Databases at InfoQ

RavenDB Sharding Progress

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed