You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure? (kbin.social)

submitted 1 year ago by Kaldo@kbin.social to c/selfhosted@lemmy.world

59 comments fedilink hide all child comments

I know that for data storage the best bet is a NAS and RAID1 or something in that vein, but what about all the docker containers you are running, carefully configured services on your rpi, installed *arr services on your PC, etc.?

Do you have a simple way to automate backups and re-installs of these as well or are you just resigned to having to eventually reconfigure them all when the SD card fails, your OS needs a reinstall or the disk dies?

top 50 comments

sorted by: hot top controversial new old

[-] rentar42@kbin.social 27 points 1 year ago

There's lots of very good approaches in the comments.

But I'd like to play the devil's advocate: how many of you have actually recovered from a disaster that way? Ideally as a test, of course.

A backup system that has never done a restore operations must be assumed to be broken. similar logic should be applied to disaster recovery.

And no: I use Ansible/Docker combined approach that I'm reasonably sure could quite easily recover most stuff, but I've not yet fully rebuilt from just that yet.

[-] dandroid@dandroid.app 4 points 1 year ago

I restored from a backup when I swapped to a bigger SSD. Worked perfectly first try. I use rsnapshot for backups.

[-] humancrayon@sh.itjust.works 4 points 1 year ago

I have (more than I’d like to admit) recovered entirely from backups.

I run proxmox, everything else in a VM. All VMs get backed up to three different places once a week, backups are tested monthly on a rando proxmox box to make sure they still work. I do like the backup system built into it, serves my needs well.

Proxmox could die and it wouldn’t make much of a difference. I reinstall proxmox, restore the VMs and I’m good to go again.

[-] Kaldo@kbin.social 2 points 1 year ago

I'm not sure what Ansible does that a simple Docker Compose doesn't yet but I will look into it more!

My real backup test run will be soon I think - for now I'm moving from windows to docker, but eventually I want to get an older laptop, put linux on it and just move everything to the docker on it instead and pretend it's a server. The less "critical" stuff I have on my main PC, the less I'm going to cry when I inevitably have to reinstall the OS or replace the drives.

[-] rentar42@kbin.social 2 points 1 year ago* (last edited 1 year ago)

I just use Ansible to prepare the OS, set up a dedicated user, install/setup Rootless Docker and then Sync all the docker compose files from the same repo to the appropriate server and launch/update as necessary. I also use it to centrally administer any cron jobs like for backup.

Basically if I didn't forget anything (which is always possible) I should be able to pick a brand new RPi with an SSD and replace one of mine with a single command.

It also allows me to keep my entire setup "documented" and configured in a single git repository.

[-] deepdive@lemmy.world 2 points 1 year ago

While rsync is great, I recovered partially from an outtage... Containers with databases need special care: dumping there database...

Lesson learned !

[-] friend_of_satan@lemmy.world 15 points 1 year ago

I've had a complete drive failure twice within the last year (really old hardware) and my ansible + docker + backup made it really easy to recover from. I got new hardware and was back up and running within a few hours.

All of your services setup should be automated (through docker-compose or ansible or whatever) and all your configuration data should be backed up. This should make it easy to migrate services from one machine to another, and also to recover from a disaster.

[-] DLSantini@lemmy.ml 13 points 1 year ago

Pre.....pare....? What's that? Some sorta fruit?

[-] dr_robot@kbin.social 9 points 1 year ago

My configuration and deployment is managed entirely via an Ansible playbook repository. In case of absolute disaster, I just have to redeploy the playbook. I do run all my stuff on top of mirrored drives so a single failure isn't disastrous if I replace the drive quickly enough.

For when that's not enough, the data itself is backed up hourly (via ZFS snapshots) to a spare pair of drives and nightly to S3 buckets in the cloud (via restic). Everything automated with systemd timers and some scripts. The configuration for these backups is part of the playbooks of course. I test the backups every 6 months by trying to reproduce all the services in a test VM. This has identified issues with my restoration procedure (mostly due to potential UID mismatches).

And yes, I have once been forced to reinstall from scratch and I managed to do that rather quickly through a combination of playbooks and well tested backups.

[-] subtext@lemmy.world 2 points 1 year ago

Dang I really like your idea of testing the backup in a VM… I was worried about how I’d test mine since I only have the one machine, but a VM on my desktop or something should do just fine.

[-] CameronDev@programming.dev 9 points 1 year ago

I rsync my root and everything under it to a NAS, will hopefully save my data. I wrote some scripts manually to do that.

I think the next best thing to do is to doco your setup as mich as possible. Either by typed up notes, or ansible/packer/whatever, any documentation is better than nothing if you have to rebuild.

[-] foggy@lemmy.world 4 points 1 year ago

I have a 16tb USB HDD that syncs to my NAS whenever my workstation is idle for 20 minutes.

[-] darvocet@infosec.pub 2 points 1 year ago

I run history and then clean it up so i have a guide to follow on the next setup. It’s not even so much for drive failure but to move to the newer OS versions when available.

The ‘data’ is backed up by scripts that tar folders up and scp them off to another server.

[-] RegalPotoo@lemmy.world 8 points 1 year ago

Infrastructure as code/config as code.

The configurations of all the actual machines is managed by Puppet, with all its configs in a git repo. All the actual applications are deployed on top of Kubernetes, with all the configurations managed by helmfile and also tracked in git. I don't set anything up - I describe how I want things configured, and the tools do the actual work.

There is a "cold start" issue in my scheme - puppet requires a server component that runs on Kubernetes but I can't deploy onto kubernetes until the host machines have had their puppet manifests applied, but at that point I can just read the code and do enough of the config by hand to bootstrap everything up from scratch if I have to

[-] tetris11@lemmy.ml 8 points 1 year ago

Radical suggestion:

Once a year you buy a hard drive that can handle all of your data.
rsync everything to it
unplug it, put it back in cold storage

[-] atzanteol@sh.itjust.works 9 points 1 year ago

Once a... year? There's a lot that can change in a year. Cloud storage can be pretty cheap these days. Backup to something like backblaze, S3 or Glacier nightly instead.

[-] Appoxo@lemmy.dbzer0.com 2 points 1 year ago

You can save periodically to it like once a month but keep one as a yearly backup.

[-] outcide@lemmy.world 7 points 1 year ago* (last edited 1 year ago)

Back everything up
rm -rf /
Now rebuild.

Congratulations, you now know what’s required. :-P

[-] GregoryTheGreat@programming.dev 1 points 1 year ago

Rebuild to different disks than the ones you backed up though. Don’t restore over your working data.

[-] atzanteol@sh.itjust.works 6 points 1 year ago* (last edited 1 year ago)

Most systems are provisioned in proxmox with terraform.
Configuration and setup is handled via ansible playbooks after the server is available. 2.a) Do NOT make changes on the server without updating your ansible scripts - except during troubleshooting. 2.b) Once troubleshooting is done delete and re-create the VM from scratch using only scripts to ensure it works.
VM storage is considered to be ephemeral. All long-term data/config that can't be re-created with ansible is either stored on an NFS server with a RAID5 dive configuration or backed up to that same file-server using rsnapshot.
NFS server is backed-up nightly to backblaze using duplicacy.
Any other non-VM systems like personal laptops and the like are backed up nightly to the file-server using rsnapshot. Those snapshots are then backed up to backblaze using duplicacy.

[-] clavismil@lemmy.world 2 points 1 year ago

Great summary. How does work the provision with terraform? Do you have some guide? Is possible to provision LXC/VM on proxmox with ansible instead?

[-] atzanteol@sh.itjust.works 2 points 1 year ago

I use the bpg/proxmox module to manage proxmox with terraform.

LXC was pretty straight forward. Use the proxmox_virtual_environment_container module and set parameters.

Basically I have an image that is based on a cloudinit image for Ubuntu (which I create and upload to proxmox with Ansible - but it wouldn't be hard to do manually in case of a disaster recovery). I then clone that image to create new VMs using the proxmox_virtual_environment_vm module.

[-] namelivia@lemmy.world 6 points 1 year ago

I have all my configuration as Ansible and Terraform code, so everything can be destroyed and recreated with no effort.

When it comes to the data, I made some bash script to copy, compress, encrypt and upload them encrypted. Not sure if this is the best but it is how I'm dealing with it right now.

[-] rentar42@kbin.social 4 points 1 year ago

I've got a similar setup, but use Kopia for backup which does all that you describe but also handles deduplication of data very well.

For example I've added older less structured backups to my "good" backup now and since there is a lot of duplication between a 4 year old backup and a 5 year old backup it barely increased the storage space usage.

[-] mhzawadi@lemmy.horwood.cloud 2 points 1 year ago

That sounds a lot like how I keep my stuff safe, I use backblaze for my off-site backup

[-] lemmyvore@feddit.nl 6 points 1 year ago

Install Debian stable with the ssh server included.
Keep a list of the packages that were installed after (there aren't many but still).
All docker containers have their compose files and persistent user data on a RAID1 array.
Have a backup running that rsyncs once a day /etc, /home/user and /mnt/array1/docker to another RAID1 to daily/, from daily/ once a week rsync to weekly/, from weekly/ once a monthb timestamped tarball to monthly/. Once a month I also bring out a HDD from the drawer and do a backup of monthly/ with Borg.

For recovery:

Reinstall Debian + extra packages.
Restore the docker compose and persistent files.
Run docker compose on containers.

Note that some data may need additional handling, for example databases should be dumped not rsunced.

[-] ftbd@feddit.de 5 points 1 year ago

By using NixOS and tracking the config files with git

[-] Haystack@lemmy.world 2 points 1 year ago

For real, saves so much space that would be used for VM backups.

Aside from that, I have anything important backed up to my NAS, and Duplicati backs up from there to Backblaze B2.

[-] eskuero@lemmy.fromshado.ws 5 points 1 year ago

My docker containers are all configured via docker compose so I just tar the .yml files and the outside data volumes and backup that to an external drive.

For configs living in /etc you can also backup all of them but I guess its harder to remember what you modified and where so this is why you document your setup step by step.

Something nice and easy I use for personal documentations is mdbooks.

[-] Kaldo@kbin.social 2 points 1 year ago* (last edited 1 year ago)

Ahh, so the best docker practice is to always just use outside data volumes and backup those separately, seems kinda obvious in retrospect. What about mounting them directly to the NAS (or even running docker from NAS?), for local networks the performance is probably good enough? That way I wouldn't have to schedule regular syncs and transfers between "local" device storage and NAS? Dunno if it would have a negative effect on drive longevity compared to just running a daily backup.

[-] adam@doomscroll.n8e.dev 1 points 1 year ago

If you've got a good network path NFS mounts work great. Don't forget to also back up your compose files. Then bringing a machine back up is just a case of running them.

[-] CarbonatedPastaSauce@lemmy.world 4 points 1 year ago

I actually run everything in VMs and have two hypervisors that sync everything to each other constantly, so I have hot failover capability. They also back up their live VMs to each other every day or week depending on the criticality of the VM. That way I also have some protection against OS issues or a wonky update.

Probably overkill for a self hosted setup but I’d rather spend money than time fixing shit because I’m lazy.

[-] surewhynotlem@lemmy.world 8 points 1 year ago

HA is not redundancy. It may protect from a drive failure but it completely ignores data corruption issues.

I learned this the hard way when my cryptomator decided to corrupt some of my files, and I noticed but didn't have backups.

[-] CarbonatedPastaSauce@lemmy.world 3 points 1 year ago

That’s why I also do backups, as I mentioned.

[-] rentar42@kbin.social 2 points 1 year ago

yeah, there's a bunch of lessons that tend to only be learned the hard way, despite most guides mentioning them.

similarly to how RAID should not be treated as a backup.

[-] guitarsarereal@sh.itjust.works 4 points 1 year ago* (last edited 1 year ago)

The most useful philosophy I've come across is "make the OS instance disposable." That means an almost backups-first approach. Everything of importance to me is thoroughly backed up so once main box goes kaput, I just have to pull the most recent copy of the dataset and provision it on a new OS, maybe new hardware if needed. These days, it's not that difficult. Docker makes scripting backups easy as pie. You write your docker-compose so all config and program state lives in a single directory. Back up the directory, and all you need to get up and running again with your services is access to Docker Hub to fetch the application code.

Some downsides with this approach (Docker's security model sorta assumes you can secure/segment your home network better than most people are actually able to), but honestly, for throwing up a small local service quickly it's kind of fantastic. Also, if you decide to move away from Docker the experience will give you insight into what amounts to program state for the applications you use which will make doing the same thing without Docker that much easier.

[-] simpleslipeagle@lemmynsfw.com 3 points 1 year ago

My server has a raid1 mdadm boot drive. And an 8 dive raid6 with zfs. It's been running for 14 years now. The only thing that I haven't replaced over it's lifetime is the chassis. In fact the proc let out the magic smoke a few weeks ago, after some new parts it's still going strong.

[-] Appoxo@lemmy.dbzer0.com 2 points 1 year ago* (last edited 1 year ago)

My whole environment is in docker-compose which is "backed" to github.
My config/system drive is backed with veeam to one drive.
The backup is backed with rsync to another drive every week.

But: I only have a 1-drive NAS because I don't have the place for a proper PC with drive caddies and a commercial nas (synology, qnap) are not my jam because I'd need a transcoding capable gpu and those models are overpriced for what I need.
And with plain debian I get unlimited system updates (per distro release) and learn linux along the way.

[-] ad_on_is@lemmy.world 2 points 1 year ago

Most of the docker services use mounted folders/files, which I usually store in the users home folder /home/username/Docker/servicename.

Now, my personal habit of choice is to have user folders on a separate drive and mount them into /home/username. Additionally, one can also mount /var/lib/docker this way. I also spin up all of these services with portainer. The benefit is, if the system breaks, I don't care that much, since everything is on a separate drive. In case of needing to re-setup everything again, I just spin up portainer again which does the rest.

However, this is not a backup, which should be done separately in one way or the other. But it's for sure safer than putting all the trust into one drive/sdcard etc.

[-] desentizised@lemm.ee 2 points 1 year ago* (last edited 1 year ago)

I used to (over a span of about 4 years now) just rely on a RaidZ2 (ZFS) pool (faulted drive replacements never gave any issues) but I recently did an expansion of the array plus OS reinstall and only now am I starting to incorporate Docker containers into my workflows. The live data is in ~ and nightly rsynced onto the new larger RaidZ2 pool but there is also data on that pool which I've thus far never stored anywhere else.

So my answer to the question would be an off-site unraid install which is still in the works. This really will only be that. A catastophe insurance. I probably won't even rely on parity drives there in order to maximize space since I already have double parity on ZFS.

As far as reinstallation goes, I don't feel like restoring ~ and running docker compose for all the services again would be too much of a hassle.

[-] vividspecter@lemm.ee 2 points 1 year ago* (last edited 1 year ago)

I put all docker data in one directory (or rather, a btrfs subvolume) and both snapshot and back it up daily to multiple machines. docker-compose files are also kept in the same subvolume.

My latest server is NixOS, so I don't even bother backing up the root subvolume, since the actual config is tracked on git and replicated on multiple machines. If I want to reinstall, I can just install NixOS and deploy the config, then just copy over the docker subvolume, and rebuild the containers. Some of this could be automated further (nixos-anywhere and disko look promising for the actual OS install) but my systems don't typically break often enough for that to be a significant issue.

You can go even further and either just use nix for the services, or use nix to build containers themselves, but I have a working setup already and it's good enough, and I can easily switch to another distribution if issues start occurring in NixOS.

[-] ikidd@lemmy.world 2 points 1 year ago* (last edited 1 year ago)

I run everything on a 2 node proxmox cluster with ZFS mirror volumes and replication of the VMs and CTs between them, run PBS with hourly snapshots, and sync that to multuple USB drives I swap off site.

The docker VM can be ZFS snapshotted before major updates so I can rollback.

load more comments (7 replies)

[-] ssdfsdf3488sd@lemmy.world 1 points 1 year ago

virtualize the machine with proxmox, use proxmox backup server, load vm on new system if you get catastrophic failure on the machine running the vm currently.

[-] Decronym@lemmy.decronym.xyz 1 points 1 year ago* (last edited 1 year ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters	More Letters
Git	Popular version control system, primarily for code
HA	Home Assistant automation software
~	High Availability
LXC	Linux Containers
NAS	Network-Attached Storage
Plex	Brand of media server package
RAID	Redundant Array of Independent Disks for mass storage
RPi	Raspberry Pi brand of SBC
SBC	Single-Board Computer
SSD	Solid State Drive mass storage

8 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

[Thread #287 for this sub, first seen 18th Nov 2023, 10:35] [FAQ] [Full list] [Contact] [Source code]

[-] emax_gomax@lemmy.world 1 points 1 year ago

I use docker so don't really have to worry about reproducibility of the Services or configurations. Docker will fetch the right services and versions. I've documented the core configurations so I can set them back up relatively easily. Anything custom I haven't documented I'll just have to remember or find I need to reset up.

load more comments

this post was submitted on 18 Nov 2023

99 points (98.1% liked)

Selfhosted

40690 readers

344 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz