Introduction
About 6-8 months ago, I decided to make the switch to ZFS to store all of my data, in hope of one day being able to back up remotely with ZFS encryption.
That day has finally arrived; I have an off-site backup that I am dumping data to.
Rationale for ZFS
People may ask why I did not use BorgBackup or other like alternatives. There are several parts to the answer:
- ZFS is an upgrade to my filesystem and will protect my data better. (I also recently upgraded my machine to have ECC RAM too, although contrary to popular belief, ECC RAM is not required for ZFS.)
- I compared the encryption protocols used by both Borg and ZFS, and in my very uninformed opinion, ZFS’s was slightly stronger.
- I wanted to be able to access my data on the remote, and though it’s possible with Borg, it’s easier with ZFS (in my opinion). This is especially important for making sure backups are good. If I can access the data on the remote the same way I do on my normal machine, I can have confidence that the backup is good.
- It’s easier to set up mirrors with ZFS.
That last reason was the real differentiation between the two. I have mirrors on both my home machine and the remote, and the ZFS set up was easy.
My Current Setup
I have a local backup as well, in the form of a (now-old) Synology Diskstation. It has served me well thus far, having saved my data on several occasions, including several times when I made mistakes while building the ZFS infrastructure in this post.
My current setup is to rsync
my data to the Diskstation. Over the years, I
made an elaborate script (353 lines of shell). This includes doing dry runs to
ensure that if something is deleted, it’s because I want it to be deleted.
Before I put the dry runs in, I almost lost data when it was deleted. Fortunately for me, that data also happened to be stored under another user on the Diskstation at the time.
There is much more to my Diskstation backup than that, such as possibly deleting software builds and things like that, but that is the gist of what I did before I managed to get a remote ZFS server.
First Attempts
My first attempts at getting a remote backup with ZFS failed spectacularly.
rsync.net
When I set up ZFS for the first time on my machine, I looked at rsync.net. I even set up an account with them.
Unfortunately, they don’t provision a full VM for customers with less than 1 TiB of data, a category I fall under. This meant that I was unable to update FreeBSD’s version of ZFS in my bhyve, which was a dealbreaker because at the time, that version of FreeBSD was running a version of ZFS without encryption.
I could have upgraded to an account that made me pay for 1 TB, but my wife and I eventually decided not to.
zfs.rent
Next, when it was announced on Hacker News, I looked into zfs.rent.
It was cheaper, so that was good. However, it was just being started, and I was not entirely impressed with the founder. So I decided to not pursue that avenue.
Actually, the success of zfs.rent, especially the fact that he turned away potential customers makes me wonder if there is a bigger market for off-site ZFS backups than I thought. Maybe I can capture some of the customers he turned away?
Renting a Server
Eventually, I came to a deal by a friend of mine who rents servers. I ended up paying more, but I have a full machine with all of the niceties I could ever want, and that means that I can move my websites away from the hosting they are on to the server I am renting, which will make deployment of the websites much easier.
The move will happen soon; I just need to learn how to set up BIND.
I also sent him some drives that he put in the machine, so he could send them to me directly should I need him to.
Another reason I am glad to pay more is that I know who this money is going to, and it’s supporting a small business in a time when small businesses are being choked.
Creating the Infrastructure
After I had the server and had installed my choice of OS and ZFS, I got to work.
Creating the Mirror
The first thing I had to do was create a mirror. That was easy; I had done it before.
zpool create -o ashift=12 -O compression=off -O atime=off -m none home mirror \
/dev/disk/by-id/<first_disk_id> /dev/disk/by-id/<second_disk_id>
I turn off atime
by default for performance.
I turn off compression by default because the author of ZFS encryption makes no guarantees that a CRIME-like attack isn’t possible against ZFS encryption with compression turned on, and since the vast majority (space-wise) of my data is compressed video, I don’t think it would help me much while also opening a potential hole.
I also set ashift
to 12 because that’s what my drives have.
Finding the Right Commands
This part was the toughest.
I knew I wanted to do a zfs send
/zfs recv
with snapshots. From
my attempt with rsync.net, I had a script to handle creating the snapshots,
but I hadn’t yet managed to send a stream successfully.
With a lot of work, and a lot of fruitless attempts, I finally reached out to the zfs-discuss mailing list, and they were immensely helpful.
I eventually learned that it was going to be best to not do a raw send, but
send the data encrypted by ssh
instead.
The basic command I wrote is the following:
zfs send -vL $token | ssh $SSHARGS "zfs recv $mountpoint -u -sd $pool"
$token
is the name of the snapshot I want to send, unless that snapshot has a
receive_resume_token
, in which case, $token
is -t <receive_resume_token>
.
If the backup is an incremental backup, $token
also has
-I <previous_snapshot>
.
$mountpoint
is blank if the mountpoint of the dataset is inherited, but if
it’s custom, $mountpoint
is -o 'mountpoint=<mountpoint>'
to make sure that
mountpoints are the same on both machines.
$pool
is obviously the pool the dataset is in.
And $SSHARGS
is redacted for obvious reasons.
Performance
My ISP’s upload speed is pretty bad. However, in my experience, it has been worse than what they advertised.
So I was not surprised at all when my first implementation was slower than the advertised upload speed. Still, with encouragement from the mailing list, I decided to dig a little.
By using bzip2
on both ends, I was able to recover some speed, giving me this
command:
zfs send -vL $token | bzip2 -c | ssh $SSHARGS \
"bzip2 -dc | zfs recv $mountpoint -u -sd $pool"
However, even with that, at the rate it was sending, it would take me more than 30 days to send it all.
I wondered if I could find another bottleneck.
So I tried using mbuffer
to ensure that ZFS was not the bottleneck. It
wasn’t because it easily filled the 1GiB buffer in seconds.
I tried Wireguard with netcat
to see if ssh
was the bottleneck. Nope; I
still had terrible upload (though it was slightly better).
Finally, I gave up.
However, I noticed that most CPU time went to bzip2
, so I wondered if I could
send two snapshots in parallel and have one send while the other was being
compressed.
The result blew me away: I got twice the upload speed.
I quickly checked to see if bzip2
had been the bottleneck by using it between
ZFS and mbuffer
; nope, it also filled the 1GiB buffer easily.
The True Bottleneck
And that led me to my most surprising discovery of all: Linux (the kernel) was the bottleneck.
I don’t know enough about kernel development to investigate, but at this point, I am sure that Linux was the reason I could not saturate my connection. Once I started sending enough datasets in parallel, I easily saturated it.
Parallelizing the Upload
Because I could cut my send time from 30 days to 6 days, it would easily be worth my time to figure out how to parallelize it.
I did this in a few steps. First, I broke up my largest datasets into more manageable sizes by making subdatasets.
Then I had a conundrum: I wanted to send subdatasets before their parents, so I
needed a way to express dependencies between datasets while being able to send
datasets in parallel. After banging my head against the wall for several hours,
I realized the answer was obvious: make
with the -j
flag.
So I wrote a Makefile
expressing those dependencies, biasing towards sending
larger datasets first. (In a Makefile
, the order of prerequisites matters.)
Then, after trying out the Makefile
and finding that it worked splendidly, I
wrote a script to generate the Makefile
, using zfs get
and
zfs list
.
End Result
If you use the scripts in this section, you do so AT YOUR OWN RISK.
The end result is three scripts:
zfs_backup.sh
, which is called from my existing backup script.zfs_send_gen.sh
, which generates theMakefile
.zfs_send.sh
, which is the script called by the targets in theMakefile
and does the actual sending.
A fourth script, labelled YESNO
in the above scripts, is also needed. It
just makes it easier for scripts to ask users yes or no questions.
zfs_backup.sh
does the following:
- Creates snapshots of every dataset in every pool, if requested.
- Deletes snapshots (if requested), leaving
$NUMSNAPS
snapshots untouched. - Generates the
Makefile
by callingzfs_send_gen.sh
. - Runs
make
, doing the parallel send.
Each make
target calls zfs_send.sh
. If the special argument umount
is
provided, zfs_send.sh
unmounts all pools on the remote (which is done in a
make
target that all others depend on). Otherwise, it does the following:
- Determines if the mountpoint of the dataset is inherited or not. If it is
not, the
$mountpoint
argument is set. - Gets the last snapshot (which
zfs_backup.sh
should have just created). - Gets the last snapshot on the remote.
- If the snapshots are the same, it exits successfully since the snapshot has already been sent.
- If the snapshot has not been sent, it continues by detecting if there is a
receive_resume_token
on the remote for that dataset. - If there is a token, it sets
$token
to-t <token_id>
. - Otherwise, it sets
$token
to the snapshot name. - It checks if there is no snapshot on the remote.
- If not, it starts sending a full stream.
- If there is a snapshot on the remote, it sends an incremental stream.
There are some cool advantages of this system.
First, if sending gets interrupted, as happened often just sending the data the
first time, I just have to run make -j$JOBS
in the directory where the
Makefile
and scripts are, and they will pick up where they left off.
Second, I can run a ZFS backup without running my Diskstation backup.
Third, since the snapshots are the exact same on both ends, I can have assurance that ZFS is making sure the data is also the exact same on both ends.
Conclusion
I don’t know that there’s much to the conclusion of this post, but here are some things I learned:
- ZFS struggles with raw sends/receives.
- The ZFS community is helpful.
- Linux’s TCP stack is awful, at least for this use case.
- You can work around the above if you can upload stuff in parallel.
make
is a great way of expressing dependencies between jobs, even if you are not using it to build software.Makefiles
are surprisingly easy to generate.- ZFS has a lot of user tools included that make querying the system easy.
- However, the ZFS man pages, while good, could be better.
The biggest lesson: use ZFS. It’s great!