View Single Post
  #4 (permalink)  
Old August 19th, 2009, 11:55 PM
Leigh Leigh is offline
Registered User
 
Join Date: Aug 2009
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Default

This will do what you have asked.
Code:
use strict;
use warnings;
# Define some file names.
my $infile="input.html";
my $outfile="output.html";

# Open the input file.
open (FILE, "<$infile") or die "$!\n";
my @input=<FILE>;		# This can be memory hungry if you have a large file, but it's quick-n-dirty enough for the sake of a demo.
close FILE;

# Now parse it.
my %row_counts;			# This is where we will keep the count. Keyed on the row where we found the table start.
my $current_id;			# We keep track of the row we fond the current table on.
# A C-style loop over the input.
for (my $i=0; $i<scalar(@input); $i++) {
	if ($input[$i]=~m/$\<table id=\"(.*?)\"/) {
		# We have a match, and $1 should contain the ID.
		$current_id=$i;
	}
	# Is this a "<row>" line?
	if ($input[$i]=~m/\<row\>/) {
		# Yes. Do we already have a count for this table?
		if ($row_counts{$current_id}) {
			# Yes, increment it.
			$row_counts{$current_id}++;
		} else {
			# No. This is the first row for this table, set the count to 1.
			$row_counts{$current_id}=1;
		}
	}
}
# We now have a has which holds all our row counts. The key of the hash is the row where the table started.
# Now print the entire array, but whenever we have a row where a table starts, insert the count.
for (my $i=0; $i<scalar(@input); $i++) {
	if ($row_counts{$i}) {
		# We are looking at a row where a table starts. We need to insert the row count.
		# split the line at whitespace.
		my @temp_array=split(/\s/, $input[$i]);
		# We should now have 3 elements:
		# ('<table', 'id="something"' and 'frame="none">'
		print "$temp_array[0] $temp_array[1] totalrow=\"$row_counts{$i}\" $temp_array[2]\n";
	} else {
		print "$input[$i]\n";
	}
	
	
}


print "Result:\n";
foreach(keys(%row_counts)) {
	print "$_=>$row_counts{$_}\n";
}
The comments should be self-explanatory.
It does, however, make a few assumptions:

It assumes that your HTML/XML is well formed.
It assumes that your input will always be in quite a rigid fashion (such as line 42, which assumes that there are no other attributes in your table definition).

It needs some degree of work, but it does what you want.
Reply With Quote