Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Community » Newbie corner » Read a PDF (Is there an example somewhere of code that extracts text from a pdf)
Read a PDF [message #58705] Sat, 30 July 2022 19:25 Go to next message
Chrisparr is currently offline  Chrisparr
Messages: 7
Registered: September 2016
Promising Member
No Message Body
Re: Read a PDF [message #58736 is a reply to message #58705] Thu, 11 August 2022 10:15 Go to previous messageGo to next message
jjacksonRIAB is currently offline  jjacksonRIAB
Messages: 219
Registered: June 2011
Experienced Member
I don't think so. As far as I know U++ cannot read pdf files, it can only output them from other formats. You'll want to find an external library for that.
Re: Read a PDF [message #58739 is a reply to message #58736] Thu, 11 August 2022 15:48 Go to previous messageGo to next message
Chrisparr is currently offline  Chrisparr
Messages: 7
Registered: September 2016
Promising Member
Hi,
I've managed to extract text from a pdf. To do that I needed the Zlib library which is part of Ultimate++. What was needed was the Zdecompress function. It took a while but it's working fine now.
Re: Read a PDF [message #58740 is a reply to message #58739] Thu, 11 August 2022 17:02 Go to previous messageGo to next message
jjacksonRIAB is currently offline  jjacksonRIAB
Messages: 219
Registered: June 2011
Experienced Member
Cool.

How do you handle when the text you're extracting from the file doesn't match the order it appears on the rendered page?
Re: Read a PDF [message #58741 is a reply to message #58740] Thu, 11 August 2022 17:16 Go to previous messageGo to next message
Chrisparr is currently offline  Chrisparr
Messages: 7
Registered: September 2016
Promising Member
I had a very specific problem.
The pdf's were of a question and answer type.
A web page posed questions and the user provided answers.
When the questions were done a pdf was generated.
When decoded I pick up the question answer pairs.
I am ignoring titles, and various other stuff which doesn't concern me.
So far all the pdf's of this type have decoded well.
I am making no claim to have created a generalised pdf translation tool
Re: Read a PDF [message #58742 is a reply to message #58740] Fri, 12 August 2022 10:16 Go to previous messageGo to next message
Chrisparr is currently offline  Chrisparr
Messages: 7
Registered: September 2016
Promising Member
Maybe you have something much better, but I've extracted the code which does the job for me.
No guarantees but as I said it does the job for me. That is all I can say
  • Attachment: Extract.cpp
    (Size: 1.49KB, Downloaded 66 times)
Re: Read a PDF [message #58743 is a reply to message #58742] Fri, 12 August 2022 10:39 Go to previous message
jjacksonRIAB is currently offline  jjacksonRIAB
Messages: 219
Registered: June 2011
Experienced Member
If it works for you every time you need it to and it does just what you need without importing a bunch of stuff you don't, then it's the best solution. I only raise the issue of it being used as a general solution because the topic question was pretty broad and I don't want to risk trivializing it for others reading, that's all.

Glad you got it working and thanks for posting your code. Very Happy

Previous Topic: Transparent Window
Next Topic: Linking error with imagefile
Goto Forum:
  


Current Time: Fri Mar 29 15:45:37 CET 2024

Total time taken to generate the page: 0.01917 seconds