this post was submitted on 05 Jun 2024
820 points (98.8% liked)
linuxmemes
21282 readers
1128 users here now
Hint: :q!
Sister communities:
Community rules (click to expand)
1. Follow the site-wide rules
- Instance-wide TOS: https://legal.lemmy.world/tos/
- Lemmy code of conduct: https://join-lemmy.org/docs/code_of_conduct.html
2. Be civil
- Understand the difference between a joke and an insult.
- Do not harrass or attack members of the community for any reason.
- Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.
- Bigotry will not be tolerated.
- These rules are somewhat loosened when the subject is a public figure. Still, do not attack their person or incite harrassment.
3. Post Linux-related content
- Including Unix and BSD.
- Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of
sudo
in Windows.
- No porn. Even if you watch it on a Linux machine.
4. No recent reposts
- Everybody uses Arch btw, can't quit Vim, and wants to interject for a moment. You can stop now.
Please report posts and comments that break these rules!
Important: never execute code or follow advice that you don't understand or can't verify, especially here. The word of the day is credibility. This is a meme community -- even the most helpful comments might just be shitposts that can damage your system. Be aware, be smart, don't fork-bomb your computer.
founded 2 years ago
MODERATORS
I can't imagine it'd be that hard to write some code that does that using an existing AI model.
I found a small command to run KDE Spectacle (screenshot software) with Tesseract so I can OCR a screenshot if I want to, I only had to install Tesseract and a main language, you could easily do the same with an API and/or a local AI.
You're probably right.
Llava and Bakllava are two Ollama models than can not only extract text but also describe what's happening on screen.
Using
tesseract-ocr
, as the other guy suggested, is probably simpler and less resource intensive though.