datahoarder

26

21

submitted 5 months ago by Cat@ponder.cat to c/datahoarder@lemmy.ml

Lexipol, also known as PoliceOne, is a private company based in Frisco, Texas that provides policy manuals, training bulletins, and consulting services to approximately 8,500 law enforcement agencies, fire departments, and other public safety departments across the United States. This leak contains the policy manuals produced by Lexipol, and some subscriber information.

Founded by two former cops that became lawyers, Lexipol retains copyright over all manuals which it creates despite the public nature of its work. There is little transparency on how decisions are made to draft their policies, which have an oversized influence on policing in the United States. The company localizes their materials to address differences in legal frameworks, depending on the city or state where the client is based.

Lexipol's manuals become public policy in thousands of jurisdictions. Lexipol's policies have been challenged in court for their role in racial profiling, harassment of immigrants, and unlawful detention. For example, Lexipol policies were used to justify body cameras being turned off when a police officer shot and killed Eric Logan in South Bend, Indiana in June 2019.

27

15

How to Hoard (feddit.org)

submitted 5 months ago by far_university190@feddit.org to c/datahoarder@lemmy.ml

1 comments fedilink

cross-posted from: https://lemmy.dbzer0.com/post/37424352

I have been lurking on this community for a while now and have really enjoyed the informational and instructional posts but a topic I don't see come up very often is scaling and hoarding. Currently, I have a 20TB server which I am rapidly filling and most posts talking about expanding recommend simply buying larger drives and slotting them in to a single machine. This definitely is the easiest way to expand, but seems like it would get you to about 100TB before you cant reasonably do that anymore. So how do you set up 100TB+ networks with multiple servers?

My main concern is that currently all my services are dockerized on a single machine running Ubuntu, which works extremely well. It is space efficient with hardlinking and I can still seed back everything. From different posts I've read, it seems like as people scale they either give up on hardlinks and then eat up a lot of their storage with copying files or they eventually delete their seeds and just keep the content. Does the Arr suite and Qbit allow dynamically selecting servers based on available space? Or are there other ways to solve these issues with additional tools? How do you guys set up large systems and what recommendations would you make? Any advice is appreciated from hardware to software!

Also, huge shout out to Saik0 from this thread: https://lemmy.dbzer0.com/post/24219297 I learned a ton from his post, but it seemed like the tip of the iceberg!

28

35

Warner Bros. uploaded full length movies to YouTube, you all know what to do (www.youtube.com)

submitted 5 months ago by cantankerous_cashew@lemmy.world to c/datahoarder@lemmy.ml

9 comments fedilink

29

43

Lots of backups of NCBI / NLM data going on at the moment ? (lemmy.sdf.org)

submitted 5 months ago by pansapiens to c/datahoarder@lemmy.ml

7 comments fedilink

Just noticed this today - seems all the archiving activity has been noticed by NCBI / NLM staff. Thankfully most of SRA (the Sequence Read Archive) and other genomic data is also mirrored in Europe.

30

63

Use some bandwidth to archive US government sites as part of the Archive Team (social.luca.run)

submitted 5 months ago by ocean@lemmy.selfhostcat.com to c/datahoarder@lemmy.ml

0 comments fedilink

cross-posted from: https://beehaw.org/post/18335989

I set up an instance of the ArchiveTeam Warrior on my home server with Docker in under 10 minutes. Feels like I'm doing my part to combat removal of information from the internet.

31

67

Please help keep climate and transportation data available (sh.itjust.works)

submitted 5 months ago by akilou@sh.itjust.works to c/datahoarder@lemmy.ml

12 comments fedilink

In light of some of the recent dystopian executive orders, a lot of data is being proactively taken down. I am relying on this data for a report I'm writing at work, and I suspect a lot of others may be relying on it for more important reasons. As such, I created two torrents, one for the data behind the ETC Explorer tool and another for the data behind the Climate and Economic Justice Screening Tool. Here's an article about taking down the latter. My team at work suspects the former will follow soon.

Here are the .torrent files. Please help seed. They're not very large at all, <300 MB.

Of course this is worthless without access to these torrents so please distribute them to any groups you think would be interested or otherwise help make them available.

32

139

Seagate smashes largest HDD world record with 36TB hard drive and reveals a 60TB model is coming (www.techradar.com)

submitted 5 months ago by cantankerous_cashew@lemmy.world to c/datahoarder@lemmy.ml

38 comments fedilink

33

23

Draft data protection rules include deletion of social media accounts upon death, unless relatives are nominated (www.thehindu.com)

submitted 5 months ago by wikipediasuckscoop@lemmy.world to c/datahoarder@lemmy.ml

0 comments fedilink

This is bad, like very bad. The proposed draft law in India, in its current form only prescribes deletions and purges of inactive accounts when the users die. There should be a clause where archiving or lock/suspension (like Facebook's memorialization feature) are described as alternative methods to account deletion.

If the law as it is is pushed through and passed by the legislature the understanding of the past will be destroyed in the long term, just like how the fires in LA have already did to the archives of the notable composer Arnold Schoenberg.

If you're an Indian citizen you can go to this page to post your feedback and concerns.

34

11

Split ZFS in two at two locations? (lemmy.nowsci.com)

submitted 6 months ago by fmstrat@lemmy.nowsci.com to c/datahoarder@lemmy.ml

5 comments fedilink

Trying to figure out if there is a way to do this without zfs sending a ton of data. I have:

s/test1, inside it are folders:
- folder1
- folder2

I have this pool backed up remotely by sending snapshots.

I'd like to split this up into:

s/test1, inside is folder:
- folder1
s/test2, inside is folder:
- folder2

I'm trying to figure out if there is some combination of clone and promote that would limit the amount of data needed to be sent over the network.

Or maybe there is some record/replay method I could do on snapshots that I'm not aware of.

Thoughts?

35

127

Only 1679 bookmarks? What is this, the 1900s? (i.ibb.co)

submitted 6 months ago by lars to c/datahoarder@lemmy.ml

12 comments fedilink

cross-posted from: https://slrpnk.net/post/17044297

You don't understand, I might need that hilarious Cracked listicle from fifteen years ago!

36

37

The Unauthorized Effort to Archive Netflix’s Disappeared Interactive Shows (404media.co)

submitted 7 months ago by lars to c/datahoarder@lemmy.ml

0 comments fedilink

37

52

GOG’s Game Preservation Program Gets Tested Early By Blizzard (www.techdirt.com)

submitted 7 months ago by hyperreal@lemmy.hyperreal.coffee to c/datahoarder@lemmy.ml

0 comments fedilink

38

52

End of Hachette v. Internet Archive (blog.archive.org)

submitted 7 months ago by antonim@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

10 comments fedilink

While we are deeply disappointed with the Second Circuit’s opinion in Hachette v. Internet Archive, the Internet Archive has decided not to pursue Supreme Court review. We will continue to honor the Association of American Publishers (AAP) agreement to remove books from lending at their member publishers’ requests.

We thank the many readers, authors and publishers who have stood with us throughout this fight. Together, we will continue to advocate for a future where libraries can purchase, own, lend and preserve digital books.

39

10

I Downloaded Aniwaves(9animes) and Anitakus(Gogoanimes) Disqus Comments (sh.itjust.works)

submitted 7 months ago* (last edited 7 months ago) by xXPoisonFoxXx@sh.itjust.works to c/datahoarder@lemmy.ml

2 comments fedilink

Most commented pages on each site sorted from most(Aniwave) to least(Anitaku) amount of comments:

Aniwave(9anime): Attack on Titan The Final Season Part 3 Episode 1

Gogoanime Old comments: Yuri on Ice Category page

Anitaku(Gogoanime): Kimetsu no Yaiba Yuukaku Hen Episode 10

Folders were compressed into tarballs with zstd level 9 compression:

Aniwave(9anime): TOTAL GB UNCOMPRESSED: 23.7 GiB TOTAL GB COMPRESSED:1.4 GiB

Gogoanime: TOTAL GB UNCOMPRESSED: 16.4 GiB TOTAL GB COMPRESSED: 769.5 MiB

Anitaku(Gogoanime): TOTAL GB UNCOMPRESSED: 7.2 GiB TOTAL GB COMPRESSED: 326.7 MiB

DOWNLOADS:

Aniwave(9anime) Comments: https://archive.org/details/aniwave-comments.tar

Anitaku(Gogoanime) March 2024: https://archive.org/details/anitaku-feb-2024-comments.tar

Gogoanime Comments Before 2021: https://archive.org/details/gogoanimes-comments-archive-prior-2021.tar

EDIT: I replaced all the mega links with archive.org links and removed all images to reduce file size

40

Update: Downloading all archive.org metadata (lemmy.dbzer0.com)

submitted 7 months ago by BermudaHighball@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

3 comments fedilink

Following up from my previous post.

I used the API at https://archive.org/developers/changes.html to enumerate all the item names in the archive. Currently there are over 256 million item names. However I went through a sample of them and noted the following:

Many do not have the .torrent available because some of the files are locked due to copyright concerns, like their music collection. Ex: https://archive.org/details/lp_le-sonate-per-pianoforte-vol-1_carl-maria-von-weber-dino-ciani_0
A lot of items have been removed from public access completely, and possibly deleted even on their storage backend. Ex: https://archive.org/details/0-5-1-0-hernan-hernandez

There are many, many items from the archive which have been removed. Much higher than I expected. If you have critical data, of course Internet Archive should never be your only backup.

I don't know the distribution of metadata and .torrent file sizes since i have not tried downloading them yet. It looks like it would require a lot of storage if there are many files or the content is huge (if only 50% of the items remain and the average .torrent + metadata is 20KB it would be over 2.5 TB to store). But on the other hand, the archive has a lot of random one off uploads that are not very big, so some metadata is 800 bytes and the torrent 3KB in those cases (only 640 GB to store if combined is 5 KB).

41

28

Downloading all archive.org metadata (lemmy.dbzer0.com)

submitted 8 months ago* (last edited 8 months ago) by BermudaHighball@lemmy.dbzer0.com to c/datahoarder@lemmy.ml

7 comments fedilink

I'd love to know if anyone's aware of a bulk metadata export feature or repository. I would like to have a copy of the metadata and .torrent files of all items.

I guess one way is to use the CLI but this relies on knowing which item you want and I don't know if there's a way to get a list of all items.

I believe downloading via BitTorrent and seeding back is a win-win: it bolsters the Archive's resilience while easing server strain. I'll be seeding the items I download.

Edit: If you want to enumerate all item names in the entire archive.org repository, take a look at https://archive.org/developers/changes.html. This will do that for you!

42

10

I'm currently archiving these URLs in my ArchiveBox instance (digital2.library.unt.edu)

submitted 8 months ago by hyperreal@lemmy.hyperreal.coffee to c/datahoarder@lemmy.ml

0 comments fedilink

Following this post in the Fediverse: https://neuromatch.social/@jonny/113444325077647843

43

81

Archival in a fascist regime (lemmy.world)

submitted 8 months ago by Gigasser@lemmy.world to c/datahoarder@lemmy.ml

21 comments fedilink

Reposted from lemmy.world c/politics since it violated it's rule #1 about links.

Now that the fascists have taken over, what books, academic studies, and pieces of knowledge should take priority in personal/private archival? I'm thinking about what happened in Nazi Germany, especially with the burning of the Institute for Sexual Science(Institut für Sexualwissenschaft) and what was lost completely in the burnings.

Some of us should consider saving stuff digitally or physically.

44

4

Sonic Adventure Cover - High Resolution Image (lemmy.world)

submitted 8 months ago by TCB13@lemmy.world to c/datahoarder@lemmy.ml

4 comments fedilink

cross-posted from: https://lemmy.world/post/21563379

Hello,

I'm looking for a high resolution image of the PAL cover from the Dreamcast (I believe).

There was this website covergalaxy that used it have in 2382x2382 but all the content seems to be gone. Here's the cache https://ibb.co/nRMhjgw . Internet archive doesn't have it.

Much appreciated!

45

10

Archiveteam Veoh grab (tracker.archiveteam.org)

submitted 8 months ago* (last edited 8 months ago) by kabi@lemm.ee to c/datahoarder@lemmy.ml

7 comments fedilink

Just looking at the numbers, it doesn't seem to me like archival will complete before the shutdown date (nov. 11). There are 2million+ elements left, likely 100TB+ of videos.

If you care to help them out, see instructions at the top of the page. Be sure you have a "clean connection", though.

edit: They're saying that the current rate seems to be plenty enough to finish by the deadline. Workers are often left idling at the moment.

46

21

youtube mishaps in the Wayback Machine (lemm.ee)

submitted 8 months ago by kabi@lemm.ee to c/datahoarder@lemmy.ml

4 comments fedilink

The September 17th archive of the oldest public video on ashens' channel is saved with the comments section of a completely different video.

(only loads on desktop for me)

Not sure how this happened, usually it's no comments section at all.

If I were trying to make a point here: an archive doesn't even have to be malicious to contain misleading information presented as fact.

47

18

Where do you suggest newbies begin? (programming.dev)

submitted 8 months ago* (last edited 8 months ago) by andioop@programming.dev to c/datahoarder@lemmy.ml

8 comments fedilink

I did try to read the sidebar resources on https://www.reddit.com/r/DataHoarder/. They're pretty overwhelming, and seem aimed at people who come in knowing all the terminology already. Is there somewhere you suggest newbies start to learn all this stuff in the first place other than those sidebar resources, or should I just suck it up and truck through the sidebar?

EDIT: At the very least, my goal is to have a 3-2-1 backup of important family photos/videos and documents, as well as my own personal documents that I deem important. I will be adding files to this system at least every 3 months that I would like incorporated into the backup. I would like to validate that everything copied over and that the files are the same when I do that, and that nothing has gotten corrupted. I want to back things up from both a Mac and a Windows (which will become a Linux soon, but I want to back up my files on the Windows machine before I try to switch to Linux in case I bungle it), if that has any impact. I do have a plan for this already, so I suppose what I really want is learning resources that don't expect me to be a computer expert with 100TB of stuff already hoarded.

48

17

Solution for multi drive archiving (lemmy.world)

submitted 8 months ago by Dotz0cat@lemmy.world to c/datahoarder@lemmy.ml

5 comments fedilink

I download lots of media files. So far I have been storing these files after I am done with them on a 2TB hard disk. I have been copying over the files with rsync. This has so far worked fairly well. However the hard disk I am using is starting to get close to full. Now I am needing to find a solution so I can span my files over multiple disks. If I were to continue to do it as I do now, I would end up copying over files that would already be on the other disk. Does the datahoading community have any solutions to this?

For more information, my system is using Linux. The 2TB drive is formatted with ext4. When I make the backup to the drive I use ’rsync -rutp’. I don’t use multiple disks at the same time due to having only one usb sata enclosure for 3 1/2 inch disks. I don’t keep the drive connected all the time due to not needing it all the time. I keep local copies until I am done with the files (and they are backed up).

49

100

If you hoard video games and aren’t selfhosting GameVault yet, you’re missing out! (lemmy.world)

submitted 8 months ago by alfagun74@lemmy.world to c/datahoarder@lemmy.ml

44 comments fedilink

Hey everyone,

it’s me again, one of the two developers behind GameVault, a self-hosted gaming platform similar to how Plex/Jellyfin is for your movies and series, but for your game collection. If you've hoarded a bunch of games over the years, this app is going to be your best friend. Think of it as your own personal Steam, hosted on your own server.

If you haven’t heard of GameVault yet, you can check it out here and get started within 5 minutes—seriously, it’s a game changer.

For those who already know GameVault, or its old name He-Who-Must-Not-Be-Named, we are excited to tell you we just launched a major update. I’m talking a massive overhaul—so much so, that we could’ve rebuilt the whole thing from scratch. Here’s the big news: We’re no longer relying on RAWG or Google Images for game metadata. Instead, we’ve officially partnered with IGDB/Twitch for a more reliable and extended metadata experience!

But it doesn’t stop there. We’ve also rolled out a new plugin system and a metadata framework that allows you to connect to multiple metadata providers at once. It’s never been this cool to run your own Steam-like platform right from your good ol' 19" incher below your desk!

What’s new in this update?

IGDB/Twitch Integration: Say goodbye to unreliable metadata scrapers. Now you can enjoy game info sourced directly from IGDB.
Customizable Metadata: Edit and fine-tune game metadata with ease. Your changes are saved separately, so the original data stays intact.
Plugin System: Build your own plugins for metadata or connect to as many sources as you want—unlimited flexibility!
Parental Controls: Manage age-appropriate access for the family and children.
Built-in Media Player: Watch game trailers and gameplay videos directly in GameVault.
UI Overhaul: A fresh, streamlined look for the app, community, game and admin interface.
Halloween Theme: For GameVault+ users, we’ve added a spooky Halloween skin just in time for the season!

Things to keep in mind when updating:

GameVault Client v1.12 is now required for servers running v13 or above.
Older clients won’t work on servers that have been updated to v13.

For a smooth update and a guide on how to use all these new features, check out the detailed migration instructions in the server changelogs. As always, if you hit any snags, feel free to reach out to us on Discord.

If you run into any issues or need help with the migration, feel free to join and open a ticket in our Discord community—we’re always happy to help!

If you want to support our pet-project and keep most upcoming features of GameVault free for everyone, consider subscribing to GameVault+ or making a one-time donation. Every little bit fuels our passion to keep building and improving!

Thanks for everything! We're more than 800 Members on our discord now and I can’t wait to hear what you think of the latest version.

50

33

How can I reduce file sizes of all my photos and videos for storage? (kbin.earth)

submitted 8 months ago by strawberry@kbin.earth to c/datahoarder@lemmy.ml

13 comments fedilink

I know for photos i could throw them through something like Converseen to take them from .jpg to .jxl, preserving the quality (identical visially, even when pixel peeping), but reducing file size by around 30%. What about video? My videos are in .h265, but can i reencode them more efficiently? im assuming that if my phone has to do live encoding, its not really making it as efficient as it could. could file sizes be reduced without losing quality by throwing some processing time at it? thank you all