I don't have any solid code to post for you, but I ran into a similar problem when I was trying to extract text records from their HTML output. Basically, what I ended up doing was the following:
Read things in lines.
In each line, look for the first ">"
This is the first place that you're likely to find data, if you read the next character. If the space between > and the next < is only 1, then you know that you're dealing with tags that run together. </TD><TD CLASS="bckGray"> for example.
What I did was go through and check to see where there were spaces greater than 1 between the close and opening of tags. <B>Here is text</B>. In your case, it looks like you're going to need to add a little extra checking to take out all of the nbsp garbage. I hope this idea makes sense, feel free to e-mail me.
|