datahoarder

9021 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago
MODERATORS
1
 
 

@ray@lemmy.ml Got it done, I'm first of the mods here and will be learning a little Lemmy over the next few weeks.

While everything is up in the air with the reddit changes I'll be very busy working on replacing the historical pushshift API without reddits bastardizations should a PS version come back.

In the mean time you should all mirror this data ensuring its survival, do what you do best and HOARD!!

https://the-eye.eu/redarcs/

2
 
 

cross-posted from: https://aussie.zone/post/27191517

I spun up nextcloud to replace onedrive about a year ago. Everything was going well so I chose not to renew my onedrive subscription, this was exactly 6 months ago, I'd assume.

I got an email a few days ago reminding me that they would delete my data. I ignored it because obviously I had moved my data to nextcloud. not gonna trick me Mi¢ro$oft.

But yesterday I decided to have a quick look though and it turns out I didn't copy over everything, and certanly not my 5 years of camera roll backups.

I started a sync of everything last night and woke up in the morning to find that it had stopped at about 10gb out of 80gb. And now onedrive won't connect and if I try to log in to onedrive with that account via the web it just kicks me back to the microsoft portal.

I'm 99.5% sure there is nothing to be done and I'm not an overly sentimental person so if they are lost it won't break me. I have many important photos backed up in immich but just not everything.

But I just needed to ask in case someone knows where to find the M spot I can touch for magic file recovery.

Edit: turns out you can just pay them more money and they still had my stuff. thank you for joining me on the shortest support ticket of all time

3
 
 

I currently have a single Seagate Ironwolf Pro hard drive which I've been running in my NAS for about two years. I kind of want to buy two more drives of the same make and capacity and make it a software RAID 5 array. Is that a good idea? Do raid arrays need to have drives of the same age?

4
 
 

Hello.

I have been attempting to find a way to automate the generation of m3u8 URLs from streaming sites which require you to click on the video player to initiate loading the media.

I've found some information relating to Selenium, but haven't used that before and haven't had any success so I'm not sure if there are other solutions.

I'd considered generating URLs for successive videos based on apparent naming conventions, iterating over them to access one at a time, [figure out how to automatically initiate the video so the m3u8 requests get made], capture the m3u8 URL, initiate download with that URL and name each appropriately with something like yt-dlp's autonumber.

I've figured out and tested options for most of these steps, but I haven't had luck with the automated loading/initiation of the video stream in order to load the m3u8 requests. I'm still doing that step manually.

My laptop is crazy old and struggles to play video in a browser, seemingly it fills up its memory and it has crashed before. So I grab the m3u8 URLs to either load them into a local media player for streaming or download them for later, the latter especially if my internet connection is struggling as it often does.

Any advice or direction is greatly appreciated.

Thank you very much!

5
6
 
 

cross-posted from: https://swg-empire.de/post/4845931

I've had multiple reads fail on a fairly new drive.

I did a smartctl -t long /dev/sdb but after checking back a few minutes later smartctl -a /dev/sdb showed that no tests were running and that the previous test had "the read element of the test failed".

I did smartctl -t offline /dev/sdb next and after that was done smartctl -x /dev/sdb showed about 1500 errors but it also reported SMART as PASSED.

Here is the output of smartctl -x /dev/sdb: https://pastebin.com/09rNZZfD

How should I interpret these results? Was my assumption that the long test was done wrong? Should I replace the drive? Or might something else be wrong, like the SATA connection?

7
 
 

cross-posted from: https://lemmy.world/post/37159807

Have fun digging, and please share interesting findings below.

8
3
submitted 2 months ago* (last edited 2 months ago) by rpollost@lemmy.ml to c/datahoarder@lemmy.ml
 
 

If you're archiving a scriptbin.works script url(or a user profile url) to the wbm or elsewhere, append ?__termsofaccessagree=y to it. This skips directly to the actual script, so the actual script is captured.
Important: The creator of scriptbin also told me to NOT use that suffix when "normally" sharing script urls, as that will be problematic for scriptbin.
In other words, ONLY use ?__termsofaccessagree=y suffix for archiving purposes.
Now that you know this, if someone else asks you about it, DON'T just comment "Append ?__termsofaccessagree=y " and walk away.
Be a good steward of the internet and also mention the aforementioned warning along with your comment.
To reiterate the warning, DO NOT use that suffix for regular normal sharing of scriptbin urls. That suffix is only for archiving purposes.

Have fun archiving.
Cheers!

9
 
 

MESA, AZ—Gleefully describing the inevitable day when society would collapse and digital files would become unusable, local physical media collector David Campbell confirmed Wednesday he was “absolutely pumped” for the downfall of humanity. “When it all goes down, there’s only going to be one place to watch the Tomb Raider movies in their entirety with all the deleted scenes, and that’s going to be my bunker,” said Campbell, his eyes reportedly shining as he described how the end of organized society and the dissolution of government would make his cherished stockpile of Blu-rays even more valuable.

“No one will be mocking the CDs I’m still holding onto when the internet goes dark forever and the only way to listen to music is through boom boxes we trade canned goods for. And I’m definitely one of the only people who has a region-free DVD player and all three seasons of Father Ted plus the Christmas special, so I’ll essentially be a king. I can’t wait.” At press time, Campbell was grinning as he purchased the 50th anniversary edition of Jaws in 4k, which he anticipated would give him full control over the drinking water supply in the event of a nuclear winter situation.

10
11
 
 
12
 
 

cross-posted from: https://lemmy.sdf.org/post/40623875

Hello. Me and a few friends are attempting to backup every files from AndroidFileHost, and we need some help in doing so.

For those who haven't heard of it, AndroidFileHost is a website that hosts various Android related files. It's one of the last surviving large Android related file serving sites, and holds a LOT of rare files especially for older android devices. (rip d-h.st) Despite being such a valuable site, it hasn't been well maintained for the past few years. Their Xitter account's last update is from around 2022, and the owner isn't replying to any e-mails. The site has been extremely unstable with various issues, most recently no file could ever be downloaded from it for about a month. Luckily, it has been (kind of) solved for now, and most (not all, about 20% files are still gone) files are back online now. However, it's clear this site needs a backup.

I have scraped their website which gives us the unique ID and MD5 hash for every files available on the site. Now, using this ID we can automate the process of requesting mirror links, downloading them and checking for integrity. (Please check an example file to understand how their system works -- https://androidfilehost.com/?fid=745425885120701975 )

The sum of every file sizes we know is roughly 180TB. It's impossible to download this on a single machine, so I've developed a "tracker" system to concurrently download multiple files using different machines. The tracker server keeps a list of every known file IDs (btw, it's 256,640 files which is a bit less than 277,467 displayed on their main page. I believe it includes deleted files as well but not sure atm), assign it to each clients that request and appropriately mark the file as downloaded. The system is pretty robust now, so our plan is working great. Except that our internet is pretty slow and we can't afford 180TB instantly.

By talking to friends and their friends, we've got quite a few people willing to help a bit here. Unfortunately many of them lack storage space, so they need to keep downloading from AFH and uploading to my server. This works for a few clients, but not for many. The "my server" here every client uploads to have 500Mbps internet, and it gets terribly slow pretty quick. Plus, 180TB of storage isn't really cheap and easy to afford.

Ideally, we need to get people with faster internet speeds (I'm in asia, so not the best place to fetch files from AFH servers mostly around Europe and America) and more storage space. If you have some bandwidth or storage to share, it would greatly help us.

I'm sorry if a post like this isn't welcomed here, if so please feel free to remove it. Thanks for reading this post.

P.S. Also worth checking out - related XDA thread https://xdaforums.com/t/did-anyone-else-notice-signs-of-androidfilehost-com-being-abandoned.4578561/ (I'm LegendOcta)

13
 
 

I am upgrading the HDDs on my QNAP TS-432X-eU rack mount NAS. The NAS is connected to a UPS via a USB cable and is set to turn off after 5 minutes if it senses a power loss. What would happen if I were to lose power while resilvering the array? Would it suspend the resilvering, turn off, then resume when power is restored? Or would the array be corrupted?

14
 
 

I'm looking to spec out a new NAS. I have a relatively small media collection, that I hope to grow as I digitize more family VHS tapes etc. Right now I have around 4 TB of data, shared across an external drive and my internal ssd.

Whats the best path forward on drives in this new NAS? I've heard advice for buying one big 20TB drive over multiple smaller drives. What's best for mitigation of drive failure? Is that even a concern? If I do multiple drives, should I use RAID?

I'm a little new to this. If you have resources for learning some best practices I'm all ears.

15
 
 

Anyone used this successfully in their setup?

Garage is an S3-compatible distributed object storage service designed for self-hosting at a small-to-medium scale.

Garage is designed for storage clusters composed of nodes running at different physical locations, in order to easily provide a storage service that replicates data at these different locations and stays available even when some servers are unreachable. Garage also focuses on being lightweight, easy to operate, and highly resilient to machine failures.

Garage is built by Deuxfleurs, an experimental small-scale self hosted service provider, which has been using it in production since its first release in 2020.

16
 
 

Looking to build a collection that I just outright own, so any streaming platform that doesn't allow me to download the raw files is a no go. Other than the big players (Amazon, Walmart, etc.) what are some good sources for buying?

17
 
 

Looking to upgrade my NAS hard drives. Currently have two 4TB WD Red Plus hard drives but I wanted to get some large capacity drives. Was looking into getting 16 or 18TB drives. My current drives are basically whisper quiet and have been running great since 2019 but I feel like it's time to upgrade the capacity.

The NAS is currently on a desk beside my computer. I don't have any cabinets to place it in and would prefer not to connect to it through Wi-Fi. Hence why I'd like for the drives to be as quiet as possible.

I was considering getting a Seagate Exos or Ironwolf (and buying used for the great price) but I've read users online saying they regret buying those models because of their noise. I was also looking at the WD Red Pro but WD's own website only rates them at 3.6/5 with most of the negative complaints about dead on arrival drives. Additionally 25% of all reviews are 1 star; both of which don't fill me with much confidence.

TLDR: What's a quiet and reliable hard drive recommendation for a NAS?

Would it be better just to go with the WD Red Plus at a lower capacity?

18
 
 

Okay, so now that my little experiment with a bunch of scam nvme drive from amazon is done and over with and I got my money back from amazon. Where do I look for some cheap and semi decent 4tb nvmes? Adata used to by my goto budget flash memory, never had any problems with any of their drives. But they're not so inexpensive anymore... Team group seems like they have good prices but how reliable are they?

Is prime day or boxing day even a good time to buy drives?

Is there any 4tb nvme under $300(CAD) even worth looking at?

Again, I'm just farting around and experimenting but any suggestion will be greatly appreciated and win you imaginary internet points from a stranger sitting on a porcelain thrown as he writes this.

19
 
 

I'm trying to digitize some VHS tapes (presumably recorded as NTSC), but I have some questions that I've yet to find answers for. My current process/setup is as follows:

  • VHS tapes are played in a PV-D4745S-K VCR
  • The VCR's composite output is captured using a generic EasyCap capture card.
  • The captured output is fed into OBS Studio with the following settings:
    • A source with it's device set to the capture card, the video format is set to YUYV 4:2:2, the resolution is set to 720x480, the frame rate is currently set to Leave Unchanged (more on this later).
    • Under Settings>Video I have set Common FPS Values with 29.97.
    • I also have set my encoding options under Settings>Output, as well as audio settings under Settings>Audio, but the details of that aren't relevant in this context.
    • I also have deinterlacing disabled by right clicking on the scene and selecting Deinterlacing>Disable.

With this, I seem to be able to capture VHS tapes with decent quality, but I have some nagging questions:

  1. How do I verify if OBS has indeed captured interlaced? I'm trying to capture both fields, but I'm unsure if that's actually happening, and I'm not sure how to go about verifying it.
  2. Should I capture at 29.97 FPS or 59.94 FPS? My thinking is that, given that I'm capturing interlaced, I would think I would multiply the number of captured frames by 2 as, if I understand correctly, each captured frame contains 2 fields, and each would be captured sequentially, so if I want to capture at 29.97 FPS interlaced, I would need to capture at 59.94 FPS. I'm not sure if I'm right about that though.
  3. I mentioned above that the framerate under the source properties is set to Leave Unchanged. The reason for why I chose that option is because the only other options that it offers for framerates are 30.00, 20.00, 10.00, and 5.00 — ie there is no option for 29.97, nor 59.94 — so I'm using Leave Unchanged in the hopes that it's autodetecting the proper frame rate, but that's mostly an assumption on my part. The closest to NTSC's 29.97 would be 30.00, but I'm not sure if this is an issue. And what's confusing me more is that I have 29.97 FPS set under Settings>Video with Common FPS Values and 29.97 set. If I set to source framerate to 30 with OBS at 29.97, will that lead to syncing issues? Is there a way to force the source to use 29.97 to match OBS? What's confusing me further about this is that if I list the formats for the capture device with
    v4l2-ctl --device=/dev/video2 --list-formats-ext
    
    I get the following output (I have truncated it to only list what's relevant, as the full output is long and contains unnecessary information):
    […]
    [0]: 'YUYV' (YUYV 4:2:2)
        size: Discrete 720x480
    […]
         Interval: Discrete 0.033s (30.000 fps)
         Interval: Discrete 0.050s (20.000 fps)
         Interval: Discrete 0.100s (10.000 fps)
         Interval: Discrete 0.200s (5.000 fps)
    […]
    
    There is no option for 29.97 FPS, and, as can be seen by the output, it matches what OBS sees. Is this an issue? It seems, to me, that the capture card isn't capable of proper NTSC framerates, and can only capture at 30 FPS as the closest value.
20
 
 

So i'm testing one of the drives I got on amazon using an old computer. It started off promising, didn't get any errors when formating the drive, write speed using a usb adapter at 25mb/s... Then it dropped off to 7mb/s and the expected time shot up from 40hours to 156h🤣

I'm going to let it run over night and see what happens in the morning. If I can get it to show 4tb without error I might still keep them to test out my geekworm pi nas, otherwise back to amazon you go!

21
 
 

I've never transferred Pokemon between gens and I've never used Pokemon Home, but it seems wild to me to be so invested into such a fickle storage system. Thoughts and prayers for the guy affected

22
 
 

I stumbled upon that new use of mp4 format. Interesting.

23
 
 

I'm sure some of you already using it like this but if not, this could be useful for you.

It creates a directory with the channel's name, create sub-directories with the playlist name, it gives them a number and put them in an order, it can continue to download if you have to cancel it midway.

You can modify it to your needs.

Add this to your ~/.bashrc or your favourite shell config.

alias yt='yt-dlp --yes-playlist --no-overwrites --download-archive ~/Downloads/yt-dlp/archive.txt -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" -o "~/Downloads/yt-dlp/%(uploader)s/%(playlist_title,single_playlist)s/%(playlist_index,00)s - %(title)s - [%(id)s].%(ext)s"'

You can even limit the download speed by adding this parameter: --limit-rate 640K This example is for 5 Mb/s.

24
 
 

Harvard made available some politics courses entirely for free available on edX which made me nosy to check out other courses in the platform and I'm finding plenty of very interesting courses in there, is there a way to bulk download the courses?

25
 
 

Scouring through amazon for random stuff yesterday, saw these "generic brand" nvmes for $65 a pop. Figured I's give it a shot for my little geekworm pie nas. 4 for the raid and 1 for backup if something goes boogers up. 20tb for $325 was too good to pass up, worse case scenario they are either 1tb each or they fail after a few months. We'll see whats up when they get here in 2 weeks.

view more: next ›