Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
that's not for TrOCR, it's just for OCR, which may not work for handwriting
I did try some of the GPT steps:
getting some errors:
this is what GPT said to run, but it makes no sense because I don't have TrOCR even downloaded or running at all.
That's the script to save and run.
Ok so from the error, you have a version of pillow that is incompatible.
You have to downgrade pillow to version 11.
That’s the first step.
EDIT: Sorry just saw the rest of your comment. Do you really have to use that tech?
You have other alternatives. Amazon AWS has a service for handwriting ocr, can’t remember the name though.
You can also have a look at this, but it’s paid: https://www.handwritingocr.com/
More ocr alternatives: https://github.com/michaelben/OCR-handwriting-recognition-libraries
+1 for tesseract. I knew about this one a while ago. It may not recognise all handwriting, but you can train it to get better at it.
I don't trust big tech to not extract data and metadata and save it. Many companies get served with government requests to save data and keep it secret. Even if handwritingocr.com doesn't have such an agreement, it could run on AWS and that has an agreement. I would much rather do this locally. Some of the writings are confidential. Handwritingocr.com says data is encrypted in transit and at rest, but it's not open source and even if it were I can't verify the server code.
also Tesseract is CPU only, right? It will be so slow.
Fair point.
So what about Tensor flow and some local LLM to do the job?
You just need to find a reliable LLM in HuggingFace, for example.
That's exactly what I am trying to do, I'm just not that sure how to do it. I have the hardware needed, I just need to set up a docker with PyTorch and then find a way to set up Gradio inside that and then add TrOCR from hugging face, and then I'm good. I just am not totally sure how to do that and it seems hard, and when I ask AI for advice, it often is like "just run the following" and it's wrong, and I'm not skilled enough to know why.
Excellent. So let's try to do that instead.
From what I can see from the docs, Gradio is used to build a web interface and have a nice UI to visualise things.
Let's put Gradio aside for now and sort out Pytorch in Docker.
Select your LLM and make sure Pytorch works well with that.
If you run into trouble or get stuck let me known.
I'll grab my laptop and try it myself.
nice starting point: https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installrad/wsl/install-pytorch.html
Terminal error after running GPT code:
LLMs are so bad at code sometimes. This happens all the time time with LLMs and code for me, the code is unusable and it saves no time because it's a rabbit hole leading to nowhere.
I also don't know if this is the right approach to the problem. Any sort of GUI interface would be easier. This is also hundreds of pages of handwritten stuff I want to change to text.
This error looks like it is saying a previous attempt aborted, and it needs you to clean up some file that was only partly downloaded.
Edit: The "please wait" makes me think I would try again in a couple hours.
So try again... in a couple of hours...
Why would that make a difference? It's a local model right?
If it is local only, then waiting probably won't help.
Another thought for you: pip behaves much better inside a virtual environment - using the Python
venvmodule, oruv.The instructions you have shared so far look more compatible with
venv.I don't understand what venv is or why this would work better. Will this make the compatibility issues go away? I could also just create a virtual Ubuntu environment that's fresh if that would be easier and try to give that environment access to my GPU but I don't know if that would work.
Venv is a Python module that helps isolate sets of Python modules from the system installed Python version.
No guarantees, but it often does.
I'm not sure either. But you've got the idea. Python packages install better when they're allowed to exist separately from the underlying operating system.