this post was submitted on 26 May 2026
8 points (90.0% liked)

Language Learning

975 readers
3 users here now

A community all about learning languages!

Ask / talk about a specific language or language learning in general.

Sopuli's instance rules apply

  1. Remember the human! (no harassment, threats, etc.)
  2. No racism or other discrimination
  3. No Nazis, QAnon or similar whackos and no endorsement of them
  4. No porn
  5. No ads or spam
  6. No content against Finnish law

Other active Lemmy language communities:

Other communities outside Lemmy:


Community banner & icon credits:

Icon: The book cover of Babel (2022 novel by R. F. Kuang)

Banner: Epic of Gilgamesh tablet (© The Trustees of the British Museum)


founded 3 years ago
MODERATORS
 

I just jail broke my kindle and have a few epubs and thought maybe this would be a good time to change my approach to vocabulary.

What I'd like to do is learn the vocabulary for my reading before I read it, instead of after, or as I'm reading it.

My dream piece of software would do the following:

  1. resolve all words down to their most basic form (ie, singular for nouns, infinitive for verbs, etc.) (My Language is French)

  2. count occurences of each word

  3. Filter out words I already know

  4. Define the words with a bilingual dictionary to english, including original context sentence.

  5. Make anki cards for me to study.

(6) God-tier programming: also include idiomatic expressions as vocabulary)

Does this exist?

Edit: Or help me assemble a pipe to get all these tasks done separately.

you are viewing a single comment's thread
view the rest of the comments
[–] emb@lemmy.world 1 points 21 hours ago* (last edited 21 hours ago) (1 children)

Was messing around with Jiten.moe (spiritual successor to jpdb, again boasts the utility of ingesting a book or subtitle file and creating anki cards) and it made me think of this question. (And Jiten is actually open-source, so the repo's there with how they do it... but I'm pretty sure it's mostly just wrapping a bunch of Japanese-specific tools.)

Did a little looking. Tried checking https://github.com/keon/awesome-nlp and didn't see anything French specific, but did come across https://github.com/french-ai/french-nlp which might have useful stuff. It sounds like a library called Spacy could be useful.

But then I ran across this tool, which might be pretty close to what you'd need? https://github.com/FreeLanguageTools/vocabsieve

VocabSieve is a companion program for language learning with Anki. Its primary function is sentence mining, in which sentences with vocabulary words are collected and added into Anki for long term retention. It aims to help intermediate learners gain vocabulary efficiently by allowing card creation with minimal friction. Possible use cases include sentence mining from videos, texts, asynchronously from ereader highlights, and even completely automatically from books or subtitles.

I haven't looked into exactly how the 'automatically from books' stuff would work or anything, but seems promising.

And I guess elephant in the room, NLP is the kind of task LLMs are actually pretty good at, so there's also always that lazy-ish route: convert the book to text, feed it through an LLM and ask it to identify important vocabulary words.

[–] schipelblorp@sh.itjust.works 2 points 20 hours ago

Thanks! Vocab sieve looks perfect (though experimental), and it works with KOReader, too. Fuck me, I'm running out of excuses.