How to grab content from pdf

Liuc · June 1st, 2011, 04:50 AM

Hi,
i have to develope a "press review" web application to acquire, catalog and search article from paper.

Papers are in pdf.
I'd like to give users the capability to select an article from the paper (some kind of mouse selection) and then store it in database , insert keyword for searches and so on.

Are there libraries, third part component that do that work?

Do you have any suggest? Alternative approaches?

thanks.

Shasur · June 6th, 2011, 06:17 AM

There are various search engines available in Market to search PDFs. A bing/google will get you a list of those

Also try if XML can help you. But whatever you are trying to do can be done using ActionScript

Cheers
Shasur

dc_dweller · August 21st, 2011, 10:41 PM

The product AspPDF.NET will do what you are looking for. See Chapter 9 of their manual under Content Extraction. It's ain't free, though.

http://www.asppdf.net

--- DC.Dweller

iphonec3 · November 25th, 2011, 05:52 AM

Hi...

Here's the cheap way of extracting text data from a PDF file:

Open the PDF file with Adobe Acrobat Reader, a free program available for download from Adobe at http://www.adobe.com/products/acrobat/readstep.html. If you'll be downloading the Acrobat Reader for the first time, or if you're downloading a newer version than the one you had, be aware that you'll want to read the page carefully before you begin the download because sometimes there are things you're agreeing to receive that you may not realize. Look around for boxes that are checked and make sure you do want what they're associated with; it you don't want it, just uncheck those boxes before you download the Acrobat Reader.

Open the PDF document on your computer and select the text tool (which looks like a T on the toolbar; in newer versions, it may look more like a capital I), and drag it over the text in the PDF document to highlight it, and then use the copy command from the Edit menu.

Open the program you want to put the text into (for example, a word processor document). Go to that program's Edit menu and select Paste.

If you want to preserve the layout and data (here comes the way that does more but costs money), and you plan to use the text in a Microsoft product, you might consider a program called BCL Drake. It is an application that automatically converts PDF documents into RTF documents -- that's Microsoft's Rich Text Format. The resulting RTF page structure will match the page structure in the original PDF file.

Hope it helps you...

Thanks

Jasmine

--------------------------------
iPhone Cases