Wrox Programmer Forums
Go Back   Wrox Programmer Forums > PHP/MySQL > Pro PHP
| Search | Today's Posts | Mark Forums Read
Pro PHP Advanced PHP coding discussions. Beginning-level questions will be redirected to the Beginning PHP forum.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Pro PHP section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old October 16th, 2003, 06:58 AM
Registered User
 
Join Date: Oct 2003
Location: Ernakulam, Kerala, India.
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via AIM to basil123 Send a message via Yahoo to basil123
Default Count the number of words in a PDF document

Hello,

I have to develop an intranet application that reads documents and count the number of words in that document.I have done the same for text/html/word documents by reading the document using the fopen() function. I have to develop the site in PHP.

BUT i don't know how to do the same for PDF documents. From the php manual I got the information that by installing pdf libraries in the server, we could use functions related to pdf documents.I have downloaded the pdf libraries and installed in my web server.Following are the method that I have tried:

Since the function "pdf_open_pdi" is used to open an existing PDF document,I tried in that way with the following code:

$pdf = pdf_new();
pdf_open_file($pdf);
$pdi = pdf_open_pdi($pdf, "firstfile.pdf", "", 0);
$page= pdf_open_pdi_page($pdf, $pdi, 1, "");

But the script returns an error because of the function "pdf_open_pdi" returned a 0 handle. The file "firstfile.pdf" was there in the right path. I want to know why the function returns a handle 0. Because of this I could not use this document handle in the function "pdf_open_pdi_page".

If iam not in the right path please advice me how to count the number of words in a pdf document that exists in the web server.

Your thoughts and advice on this would be greatly appreciated.

Thanks in advance.

:)

 
Old October 17th, 2003, 05:03 PM
Friend of Wrox
 
Join Date: Jun 2003
Location: , , USA.
Posts: 101
Thanks: 0
Thanked 1 Time in 1 Post
Send a message via AIM to Moharo
Default

hello there. you may want to check out http://php.net/manual/en/ref.pdf.php for more about pdf in php. as of the word count you can use split() function, but first you need to read the entire content of the pdf document into a variable, consider the following:

<?php

$content = "this is a sample text this is a sample text this is a sample text";
$word_count = sizeof(split(" ",$content));
echo $word_count;

?>

hope that helped :D:D:D

the genuine genius
 
Old October 17th, 2003, 05:14 PM
Friend of Wrox
 
Join Date: Jun 2003
Location: , , USA.
Posts: 101
Thanks: 0
Thanked 1 Time in 1 Post
Send a message via AIM to Moharo
Default

hey there again. here's some more...

you can not really consider single characters as words just like in an example above. if you want to be more accurate let's ignore a single characters:

<?

$content = "this is a sample text this is a sample text this is a sample text";
$str_array = split(" ",$content);
$word_count = 0;


for($i=0;$i<sizeof($str_array);$i++)
{
    if(strlen($str_array[$i]) > 1)
    {
         $word_count++;
    }
}

echo $word_count;

?>

:D

the genuine genius
 
Old October 18th, 2003, 04:54 PM
Friend of Wrox
Points: 2,570, Level: 21
Points: 2,570, Level: 21 Points: 2,570, Level: 21 Points: 2,570, Level: 21
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: San Diego, CA, USA
Posts: 836
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Well, counting the words in the document wasn't really the problem, it was getting the document open in the first place.

The PDI functions that ship with the PDF extension of PHP aren't enabled by default, if I remember correctly. I could be wrong, though... The reason I suspect this is the case is because the basic (read: free) version of PDFLib doesn't include the PDI extension.

PHP's PDF extension would still have the functions defined, but they wouldn't actually work unless your version of PDFLib implemented the PDI functions.

I would suggest going through each line in your code to see which function is actually failing... is it pdf_new() or really pdf_open_pdi()?



Take care,

Nik
http://www.bigaction.org/
 
Old July 19th, 2006, 11:06 PM
Registered User
 
Join Date: Jul 2006
Location: , , .
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default

I'm looking for same issue...
Could you accomplish reading a pdf document and counting the words on it?
I would appreciate your help, thank you very much

 
Old May 27th, 2009, 01:00 AM
Registered User
 
Join Date: May 2009
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default Word count in pdf file

I am looking for the same,
if anybody have the solution
please sent it to me,
I am very thankful to u.

Thanks in advance
Vikas
vikasp@saamarth.net




Similar Threads
Thread Thread Starter Forum Replies Last Post
convert number into words on acces form superparim Wrox Book Feedback 0 September 19th, 2005 01:55 PM
count the TOTAL number of segments crmpicco Classic ASP Basics 2 February 1st, 2005 05:03 AM
Words Document maha HTML Code Clinic 8 September 29th, 2004 03:08 AM
Changing number to words kekohchaa VS.NET 2002/2003 3 April 7th, 2004 11:27 PM
Count the number of words in a PDF document. basil123 PHP How-To 0 October 16th, 2003 06:51 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.