this post was submitted on 26 May 2026
8 points (90.0% liked)
Language Learning
975 readers
3 users here now
A community all about learning languages!
Ask / talk about a specific language or language learning in general.
Sopuli's instance rules apply
- Remember the human! (no harassment, threats, etc.)
- No racism or other discrimination
- No Nazis, QAnon or similar whackos and no endorsement of them
- No porn
- No ads or spam
- No content against Finnish law
Other active Lemmy language communities:
- !duolingo@lemmy.world
- !japaneselanguage@sopuli.xyz
- !chinese@lemmy.world
- !learn_finnish@sopuli.xyz
- !german@lemmy.world
- !latin@piefed.social
- !estonian@sopuli.xyz
- !spanish@sopuli.xyz
- !translator@sopuli.xyz (translation studies)
- !esperanto@sopuli.xyz
Other communities outside Lemmy:
Community banner & icon credits:
Icon: The book cover of Babel (2022 novel by R. F. Kuang)
Banner: Epic of Gilgamesh tablet (© The Trustees of the British Museum)
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I've only done enough programming to know this is very possible. A word count is probably all I'd need to do this manualy. Just wondering if this is one of those things I do instead of learning, so the less time I spend on it, the better I'll feel.
Was messing around with Jiten.moe (spiritual successor to jpdb, again boasts the utility of ingesting a book or subtitle file and creating anki cards) and it made me think of this question. (And Jiten is actually open-source, so the repo's there with how they do it... but I'm pretty sure it's mostly just wrapping a bunch of Japanese-specific tools.)
Did a little looking. Tried checking https://github.com/keon/awesome-nlp and didn't see anything French specific, but did come across https://github.com/french-ai/french-nlp which might have useful stuff. It sounds like a library called Spacy could be useful.
But then I ran across this tool, which might be pretty close to what you'd need? https://github.com/FreeLanguageTools/vocabsieve
I haven't looked into exactly how the 'automatically from books' stuff would work or anything, but seems promising.
And I guess elephant in the room, NLP is the kind of task LLMs are actually pretty good at, so there's also always that lazy-ish route: convert the book to text, feed it through an LLM and ask it to identify important vocabulary words.
Thanks! Vocab sieve looks perfect (though experimental), and it works with KOReader, too. Fuck me, I'm running out of excuses.