Tricks of the trade: Figuring out progress of a large upload

time to read 2 min | 270 words

I found myself today needing to upload a file to S3, the upload size is a few hundred GBs in size. I expected the appropriate command, like so:

aws s3api put-object --bucket twitter-2020-rvn-dump --key mydb.backup --body ./mydb.backup

But then I realized that this is uploading a few hundred GB file to S3, which may take a while. The command doesn’t have any progress information, so I had no way to figure out where it is at.

I decided to see what I can poke around to find, first, I ran this command:

ps -aux | grep s3api

This gave me the PID of the upload process in question.

Then I checked the file descriptors for this process, like so:

$ ls -alh /proc/84957/fd


total 0
dr-x------ 2 ubuntu ubuntu  0 Mar 30 08:10 .
dr-xr-xr-x 9 ubuntu ubuntu  0 Mar 30 08:00 ..
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 0 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 1 -> /dev/pts/8
lrwx------ 1 ubuntu ubuntu 64 Mar 30 08:10 2 -> /dev/pts/8
lr-x------ 1 ubuntu ubuntu 64 Mar 30 08:10 3 -> /backups/mydb.backup

As you can see, we can tell that file descriptor#3 is the one that we care about, then we can ask for more details:

$ cat /proc/84957/fdinfo/3
pos: 140551127040 flags: 02400000 mnt_id: 96 ino: 57409538

In other words, the process is currently at ~130GB of the file or there about.

It’s not ideal, but it does give me some idea about where we are at. It is a nice demonstration of the ability to poke into the insides of a running system to figure out what is going on.