116

cross-posted from: https://lemmy.dbzer0.com/post/21328454

PGSub - A Giant Archive of Subtitles For Everyone

I've been working on this subtitle archive project for some time. It is a Postgres database along with a CLI and API application allowing you to easily extract the subs you want. It is primarily intended for encoders or people with large libraries, but anyone can use it!

PGSub is composed from three dumps:

  • opensubtitles.org.Actually.Open.Edition.2022.07.25
  • Subscene V2 (prior to shutdown)
  • Gnome's Hut of Subs (as of 2024-04)

As such, it is a good resource for films and series up to around 2022.

Some stats (copied from README):

  • Out of 9,503,730 files originally obtained from dumps, 9,500,355 (99.96%) were inserted into the database.
  • Out of the 9,500,355 inserted, 8,389,369 (88.31%) are matched with a film or series.
  • There are 154,737 unique films or series represented, though note the lines get a bit hazy when considering TV movies, specials, and so forth. 133,780 are films, 20,957 are series.
  • 93 languages are represented, with a special '00' language indicating a .mks file with multiple languages present.
  • 55% of matched items have a FPS value present.

Once imported, the recommended way to access it is via the CLI application. The CLI and API can be compiled on Windows and Linux (and maybe Mac), and there also pre-built binaries available.

The database dump is distributed via torrent (if it doesn't work for you, let me know), which you can find in the repo. It is ~243 GiB compressed, and uses a little under 300 GiB of table space once imported.

For a limited time I will devote some resources to bug-fixing the applications, or perhaps adding some small QoL improvements. But, of course, you can always fork them or make or own if they don't suit you.

top 12 comments
sorted by: hot top controversial new old
[-] matey@lemmy.dbzer0.com 14 points 1 month ago

Does this work with Bazarr?

[-] Omgboom@lemmy.zip 3 points 1 month ago

Asking the real question

[-] abbadon420@lemm.ee -1 points 1 month ago

Does this qork with plex? (I guess not)

[-] ancoraunamoka@lemmy.dbzer0.com 1 points 1 month ago

Why is it taking so much space in compressed form? I think text compresses very well so you should be able to save tons of space compared to db tables

[-] Teknikal@lemm.ee 0 points 1 month ago

A Stremio plugin using this would be nice I keep needing to change mine almost on a monthly basis for some reason.

[-] onlinepersona@programming.dev -1 points 1 month ago
[-] ExcessShiv@lemmy.dbzer0.com 4 points 1 month ago

Why would a subtitle repo be taken down? AFAIK creating and distributing subtitles for media is legal everywhere?

[-] onlinepersona@programming.dev -4 points 1 month ago

🤔 is it? IIRC song lyrics were copyrighted and only certain websites were legally allowed to host them. I expected subtitles to have the same problem.

Maybe I'm wrong, but it wouldn't surprise me if this were taken down.

Anti Commercial-AI license

[-] Summzashi@lemmy.one 2 points 1 month ago

Hahahahhaha that license hahahhahahahahahahhaha

[-] liliumstar@lemmy.dbzer0.com 2 points 1 month ago

If it gets taken down, I will rehost elsewhere.

[-] onlinepersona@programming.dev 0 points 1 month ago

You could consider radicle - it's a distributed source forge. But we'll see if it ever gets that far.

Anti Commercial-AI license

this post was submitted on 31 May 2024
116 points (100.0% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

52700 readers
335 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder


💰 Please help cover server costs.

Ko-FiLiberapay


founded 1 year ago
MODERATORS