Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Open Source > Perl
| Search | Today's Posts | Mark Forums Read
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Perl section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
  #1 (permalink)  
Old May 25th, 2009, 03:04 AM
Authorized User
 
Join Date: Apr 2009
Posts: 31
Thanks: 0
Thanked 0 Times in 0 Posts
Default Count and replace

Hi,

XML:
Code<tableset>
<table id="acprof-9780199226009-table-1" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>

</tbody>
</tgroup>
</table>
<table id="acprof-9780199226009-table-1" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>

<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
</tableset>

Need output xml:

Code<tableset>
<table id="acprof-9780199226009-table-1" totalrow="2" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table id="acprof-9780199226009-table-1" totalrow="3" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
</tableset>

counting the each table tag inside the <row>tag, and place the counting value in the table attribute value like totalrow="2", which is highlighted in the red text.

Anyone help me how to counting the element in perl scripting...

Regards,
Nagaraj
  #2 (permalink)  
Old June 18th, 2009, 11:50 AM
Friend of Wrox
Points: 1,515, Level: 15
Points: 1,515, Level: 15 Points: 1,515, Level: 15 Points: 1,515, Level: 15
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Dec 2003
Location: Oxford, , United Kingdom.
Posts: 488
Thanks: 0
Thanked 3 Times in 3 Posts
Default

HTML::Parser could do this. There's a pretty good tutorial. I don't have the link, but you should be able to google it. Otherwise, read the file into an array, parse the array backwards, so that you know how many row elements you've hit by the time you get to the table declaration
__________________
--
Charlie Harvey's website - linux, perl, java, anarchism and punk rock: http://charlieharvey.org.uk
  #3 (permalink)  
Old June 19th, 2009, 11:52 AM
Authorized User
 
Join Date: Apr 2009
Posts: 31
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi Cider,

Code:
#!/usr/bin/perl

use strict;

my $directory = "D:/Nagaraj/oup/OUP-Brenkert/Preprocessed";

use File::Find;
use strict;
        
my $count;
my $s;
my $nb;
my $cou;
my $i;
my $ss;
my $tag;

#my $directory = "";

find (\&process, $directory);

sub process
{
    my @outLines;  #Data we are going to output
    my $Replace;      #Data we are reading line by line

        #print "processing $_ / $File::Find::name\n";

    # Only parse files that end in .xml
    if ( ($File::Find::name =~ /\.XML$/) || ($File::Find::name =~ /\.xml$/)) {

        open (FILE, $File::Find::name ) or die "Cannot open file: $!";

        while ( $Replace = <FILE> ) {

                $cou =~ s/<tableGroup([^>]+)<\/tableGroup>/<tableGroup$1<\/tableGroup>/gi;
                        for($i=0;$i<=$cou;$i++)
                                {
                                    if(/(<tableGroup([^>]+)<\/tableGroup>)/)
                                    {
                                            $ss=$1;
                                            $count = ($ss =~ s/<row/<row/g);
                                            $Replace =~ s/<tgroup cols\="(.*?)">/<tgroup cols\="$1"><SPiTable><SPiTable-body xmlns\:aid\="http:\/\/ns.adobe.com\/AdobeInDesign\/4.0\/" aid\:table\="table" aid\:trows\="$count" aid:tcols="$1">/g;
                                            print $count;
                                   }

                                }
              #$Replace =~ s/&ndash;/&#x2013;/g;
              push(@outLines, $Replace);
              }

        close FILE;
        open ( OUTFILE, ">$File::Find::name" ) or
        die "Cannot open file: $!";

        print ( OUTFILE @outLines );
        close ( OUTFILE );

        undef( @outLines );
           }
        }
I tried my best in the above perl script but it not count and replace the string.

You may corrrect this.........?????????
  #4 (permalink)  
Old August 19th, 2009, 11:55 PM
Registered User
 
Join Date: Aug 2009
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Default

This will do what you have asked.
Code:
use strict;
use warnings;
# Define some file names.
my $infile="input.html";
my $outfile="output.html";

# Open the input file.
open (FILE, "<$infile") or die "$!\n";
my @input=<FILE>;		# This can be memory hungry if you have a large file, but it's quick-n-dirty enough for the sake of a demo.
close FILE;

# Now parse it.
my %row_counts;			# This is where we will keep the count. Keyed on the row where we found the table start.
my $current_id;			# We keep track of the row we fond the current table on.
# A C-style loop over the input.
for (my $i=0; $i<scalar(@input); $i++) {
	if ($input[$i]=~m/$\<table id=\"(.*?)\"/) {
		# We have a match, and $1 should contain the ID.
		$current_id=$i;
	}
	# Is this a "<row>" line?
	if ($input[$i]=~m/\<row\>/) {
		# Yes. Do we already have a count for this table?
		if ($row_counts{$current_id}) {
			# Yes, increment it.
			$row_counts{$current_id}++;
		} else {
			# No. This is the first row for this table, set the count to 1.
			$row_counts{$current_id}=1;
		}
	}
}
# We now have a has which holds all our row counts. The key of the hash is the row where the table started.
# Now print the entire array, but whenever we have a row where a table starts, insert the count.
for (my $i=0; $i<scalar(@input); $i++) {
	if ($row_counts{$i}) {
		# We are looking at a row where a table starts. We need to insert the row count.
		# split the line at whitespace.
		my @temp_array=split(/\s/, $input[$i]);
		# We should now have 3 elements:
		# ('<table', 'id="something"' and 'frame="none">'
		print "$temp_array[0] $temp_array[1] totalrow=\"$row_counts{$i}\" $temp_array[2]\n";
	} else {
		print "$input[$i]\n";
	}
	
	
}


print "Result:\n";
foreach(keys(%row_counts)) {
	print "$_=>$row_counts{$_}\n";
}
The comments should be self-explanatory.
It does, however, make a few assumptions:

It assumes that your HTML/XML is well formed.
It assumes that your input will always be in quite a rigid fashion (such as line 42, which assumes that there are no other attributes in your table definition).

It needs some degree of work, but it does what you want.


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSL: Count = Count + 1 elayaraja.s XSLT 3 July 18th, 2008 03:21 AM
is there any in built function to count page count g.tamilselvan MySQL 1 February 15th, 2006 07:43 AM
replace "." with "/" thelos Excel VBA 1 September 14th, 2005 10:47 AM
Replace crmpicco VB How-To 15 May 20th, 2005 01:34 PM
Count, sum, count a value, return records CongoGrey Access 1 April 18th, 2005 02:25 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.