p2p.wrox.com Forums

p2p.wrox.com Forums (http://p2p.wrox.com/index.php)
-   Perl (http://p2p.wrox.com/forumdisplay.php?f=196)
-   -   Count and replace (http://p2p.wrox.com/showthread.php?t=74555)

Nagaraj May 25th, 2009 03:04 AM

Count and replace
 
Hi,

XML:
Code<tableset>
<table id="acprof-9780199226009-table-1" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>

</tbody>
</tgroup>
</table>
<table id="acprof-9780199226009-table-1" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>

<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
</tableset>

Need output xml:

Code<tableset>
<table id="acprof-9780199226009-table-1" totalrow="2" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table id="acprof-9780199226009-table-1" totalrow="3" frame="none">
<tgroup cols="4">
<colspec colnum="1" colname="col1"/>
<colspec colnum="2" colname="col2"/>
<colspec colnum="3" colname="col3"/>
<colspec colnum="4" colname="col4"/>
<thead>
<row rowsep="1">
<entry colname="col1"/>
<entry colname="col2" align="center">
<p>
<b>UK</b>
</p>
</entry>
<entry colname="col3" align="center">
<p>
<b>France</b>
</p>
</entry>
<entry colname="col4" align="center">
<p>
<b>Germany</b>
</p>
</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
<row>
<entry colname="col1" align="left">
<p>Industry</p>
</entry>
<entry colname="col2" align="char" char=".">
<p>17.5</p>
</entry>
<entry colname="col3" align="char" char=".">
<p>14.4</p>
</entry>
<entry colname="col4" align="char" char=".">
<p>25.4</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
</tableset>

counting the each table tag inside the <row>tag, and place the counting value in the table attribute value like totalrow="2", which is highlighted in the red text.

Anyone help me how to counting the element in perl scripting...

Regards,
Nagaraj

ciderpunx June 18th, 2009 11:50 AM

HTML::Parser could do this. There's a pretty good tutorial. I don't have the link, but you should be able to google it. Otherwise, read the file into an array, parse the array backwards, so that you know how many row elements you've hit by the time you get to the table declaration

Nagaraj June 19th, 2009 11:52 AM

Hi Cider,

Code:

#!/usr/bin/perl

use strict;

my $directory = "D:/Nagaraj/oup/OUP-Brenkert/Preprocessed";

use File::Find;
use strict;
       
my $count;
my $s;
my $nb;
my $cou;
my $i;
my $ss;
my $tag;

#my $directory = "";

find (\&process, $directory);

sub process
{
    my @outLines;  #Data we are going to output
    my $Replace;      #Data we are reading line by line

        #print "processing $_ / $File::Find::name\n";

    # Only parse files that end in .xml
    if ( ($File::Find::name =~ /\.XML$/) || ($File::Find::name =~ /\.xml$/)) {

        open (FILE, $File::Find::name ) or die "Cannot open file: $!";

        while ( $Replace = <FILE> ) {

                $cou =~ s/<tableGroup([^>]+)<\/tableGroup>/<tableGroup$1<\/tableGroup>/gi;
                        for($i=0;$i<=$cou;$i++)
                                {
                                    if(/(<tableGroup([^>]+)<\/tableGroup>)/)
                                    {
                                            $ss=$1;
                                            $count = ($ss =~ s/<row/<row/g);
                                            $Replace =~ s/<tgroup cols\="(.*?)">/<tgroup cols\="$1"><SPiTable><SPiTable-body xmlns\:aid\="http:\/\/ns.adobe.com\/AdobeInDesign\/4.0\/" aid\:table\="table" aid\:trows\="$count" aid:tcols="$1">/g;
                                            print $count;
                                  }

                                }
              #$Replace =~ s/&ndash;/&#x2013;/g;
              push(@outLines, $Replace);
              }

        close FILE;
        open ( OUTFILE, ">$File::Find::name" ) or
        die "Cannot open file: $!";

        print ( OUTFILE @outLines );
        close ( OUTFILE );

        undef( @outLines );
          }
        }

I tried my best in the above perl script but it not count and replace the string.

You may corrrect this.........?????????

Leigh August 19th, 2009 11:55 PM

This will do what you have asked.
Code:

use strict;
use warnings;
# Define some file names.
my $infile="input.html";
my $outfile="output.html";

# Open the input file.
open (FILE, "<$infile") or die "$!\n";
my @input=<FILE>;                # This can be memory hungry if you have a large file, but it's quick-n-dirty enough for the sake of a demo.
close FILE;

# Now parse it.
my %row_counts;                        # This is where we will keep the count. Keyed on the row where we found the table start.
my $current_id;                        # We keep track of the row we fond the current table on.
# A C-style loop over the input.
for (my $i=0; $i<scalar(@input); $i++) {
        if ($input[$i]=~m/$\<table id=\"(.*?)\"/) {
                # We have a match, and $1 should contain the ID.
                $current_id=$i;
        }
        # Is this a "<row>" line?
        if ($input[$i]=~m/\<row\>/) {
                # Yes. Do we already have a count for this table?
                if ($row_counts{$current_id}) {
                        # Yes, increment it.
                        $row_counts{$current_id}++;
                } else {
                        # No. This is the first row for this table, set the count to 1.
                        $row_counts{$current_id}=1;
                }
        }
}
# We now have a has which holds all our row counts. The key of the hash is the row where the table started.
# Now print the entire array, but whenever we have a row where a table starts, insert the count.
for (my $i=0; $i<scalar(@input); $i++) {
        if ($row_counts{$i}) {
                # We are looking at a row where a table starts. We need to insert the row count.
                # split the line at whitespace.
                my @temp_array=split(/\s/, $input[$i]);
                # We should now have 3 elements:
                # ('<table', 'id="something"' and 'frame="none">'
                print "$temp_array[0] $temp_array[1] totalrow=\"$row_counts{$i}\" $temp_array[2]\n";
        } else {
                print "$input[$i]\n";
        }
       
       
}


print "Result:\n";
foreach(keys(%row_counts)) {
        print "$_=>$row_counts{$_}\n";
}

The comments should be self-explanatory.
It does, however, make a few assumptions:

It assumes that your HTML/XML is well formed.
It assumes that your input will always be in quite a rigid fashion (such as line 42, which assumes that there are no other attributes in your table definition).

It needs some degree of work, but it does what you want.


All times are GMT -4. The time now is 08:48 PM.

Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.