This will do what you have asked.
Code:
use strict;
use warnings;
# Define some file names.
my $infile="input.html";
my $outfile="output.html";
# Open the input file.
open (FILE, "<$infile") or die "$!\n";
my @input=<FILE>; # This can be memory hungry if you have a large file, but it's quick-n-dirty enough for the sake of a demo.
close FILE;
# Now parse it.
my %row_counts; # This is where we will keep the count. Keyed on the row where we found the table start.
my $current_id; # We keep track of the row we fond the current table on.
# A C-style loop over the input.
for (my $i=0; $i<scalar(@input); $i++) {
if ($input[$i]=~m/$\<table id=\"(.*?)\"/) {
# We have a match, and $1 should contain the ID.
$current_id=$i;
}
# Is this a "<row>" line?
if ($input[$i]=~m/\<row\>/) {
# Yes. Do we already have a count for this table?
if ($row_counts{$current_id}) {
# Yes, increment it.
$row_counts{$current_id}++;
} else {
# No. This is the first row for this table, set the count to 1.
$row_counts{$current_id}=1;
}
}
}
# We now have a has which holds all our row counts. The key of the hash is the row where the table started.
# Now print the entire array, but whenever we have a row where a table starts, insert the count.
for (my $i=0; $i<scalar(@input); $i++) {
if ($row_counts{$i}) {
# We are looking at a row where a table starts. We need to insert the row count.
# split the line at whitespace.
my @temp_array=split(/\s/, $input[$i]);
# We should now have 3 elements:
# ('<table', 'id="something"' and 'frame="none">'
print "$temp_array[0] $temp_array[1] totalrow=\"$row_counts{$i}\" $temp_array[2]\n";
} else {
print "$input[$i]\n";
}
}
print "Result:\n";
foreach(keys(%row_counts)) {
print "$_=>$row_counts{$_}\n";
}
The comments should be self-explanatory.
It does, however, make a few assumptions:
It assumes that your HTML/XML is well formed.
It assumes that your input will always be in quite a rigid fashion (such as line 42, which assumes that there are no other attributes in your table definition).
It needs some degree of work, but it does what you want.