this post was submitted on 09 Dec 2025
39 points (100.0% liked)
askchapo
23192 readers
235 users here now
Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.
Rules:
-
Posts must ask a question.
-
If the question asked is serious, answer seriously.
-
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
-
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Cool, thank you very much. I got k2pdf (courtesy of another dope-ass bear) to get the two columns + footnotes in the original pdf into a pdf that is just one column with footnotes clearly distinguishable. Now I need just what you're saying because the result of the k2pdf conversion is an image that I can't select text from (but the words are all in the right order, which is good).
Tesseract seems like a popular choice, I'll give that a try.
Tesseract doesn't support PDF input, you'll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it's own image (which I have also done but I forget how right now.)
ⓘ This user is suspected of being a cat. Please report any suspicious behavior.
Thanks again! You're the best :)
This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.