Wrox Programmer Forums
Go Back   Wrox Programmer Forums > .NET > Other .NET > General .NET
General .NET For general discussion of MICROSOFT .NET topics that don't fall within any of the other .NET forum subcategories or .NET language forums.  If your question is specific to a language (C# or Visual Basic) or type of application (Windows Forms or ASP.Net) try an applicable forum category. ** PLEASE BE SPECIFIC WITH YOUR QUESTION ** When posting here, provide details regarding the Microsoft .NET language you are using and/or what type of application (Windows/Web Forms, etc) you are working in, if applicable to the question. This will help others answer the question without having to ask.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the General .NET section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old June 1st, 2011, 04:50 AM
Registered User
Join Date: Jun 2011
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default How to grab content from pdf

i have to develope a "press review" web application to acquire, catalog and search article from paper.

Papers are in pdf.
I'd like to give users the capability to select an article from the paper (some kind of mouse selection) and then store it in database , insert keyword for searches and so on.

Are there libraries, third part component that do that work?

Do you have any suggest? Alternative approaches?

Old June 6th, 2011, 06:17 AM
Friend of Wrox
Join Date: Sep 2005
Posts: 812
Thanks: 1
Thanked 53 Times in 49 Posts

There are various search engines available in Market to search PDFs. A bing/google will get you a list of those

Also try if XML can help you. But whatever you are trying to do can be done using ActionScript

C# Code Snippets (http://www.dotnetdud.blogspot.com)

VBA Tips & Tricks (http://www.vbadud.blogspot.com)
Old August 21st, 2011, 10:41 PM
Registered User
Join Date: Aug 2011
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default Try AspPDF.NET

The product AspPDF.NET will do what you are looking for. See Chapter 9 of their manual under Content Extraction. It's ain't free, though.


--- DC.Dweller
Old November 25th, 2011, 05:52 AM
Registered User
Join Date: Nov 2011
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts


Here's the cheap way of extracting text data from a PDF file:

Open the PDF file with Adobe Acrobat Reader, a free program available for download from Adobe at http://www.adobe.com/products/acrobat/readstep.html. If you'll be downloading the Acrobat Reader for the first time, or if you're downloading a newer version than the one you had, be aware that you'll want to read the page carefully before you begin the download because sometimes there are things you're agreeing to receive that you may not realize. Look around for boxes that are checked and make sure you do want what they're associated with; it you don't want it, just uncheck those boxes before you download the Acrobat Reader.

Open the PDF document on your computer and select the text tool (which looks like a T on the toolbar; in newer versions, it may look more like a capital I), and drag it over the text in the PDF document to highlight it, and then use the copy command from the Edit menu.

Open the program you want to put the text into (for example, a word processor document). Go to that program's Edit menu and select Paste.

If you want to preserve the layout and data (here comes the way that does more but costs money), and you plan to use the text in a Microsoft product, you might consider a program called BCL Drake. It is an application that automatically converts PDF documents into RTF documents -- that's Microsoft's Rich Text Format. The resulting RTF page structure will match the page structure in the original PDF file.

Hope it helps you...



iPhone Cases

Similar Threads
Thread Thread Starter Forum Replies Last Post
grab dynamic image microbytes VB How-To 2 June 30th, 2008 08:21 PM
How to grab a column value from a GridView convergent7 ASP.NET 2.0 Basics 0 July 14th, 2006 12:45 AM
Display pdf in content walgr2k ASP.NET 2.0 Basics 0 May 1st, 2006 05:19 PM
Grab MSISDN using php johnn PHP How-To 0 November 13th, 2003 09:34 PM

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.