PHP Web-browser DOM functionality

Philibuster · January 28th, 2005, 01:34 AM

I need the equivalent of a client-side web-browser with Document Object Model (DOM) functionality that runs on the server-side. The DOM should support one of the standards with the usual properties (.document... ), methods (.navigate...), collections (.anchors), events(.onclick...) etc. One might think of this as the core of a more advanced webcrawler; however, most PHP webcrawlers I've seen are very primitive - they don't even create a DOM of HTML tags within navigated pages. I don't have enough time to write a full DOM parser - and DOMXML is just too limited to use. Does anyone have any ideas?

Philibuster

Philibuster · August 15th, 2006, 09:34 PM

In the end, I implemented a LibCurl interface with a SAX Parser to construct an OO DOM model. LibCurl handled http/https, cookies, and other HTTP requirements. The OO DOM was constructed via the SAX call-back functions (begin, character, end). DOM construction was somewhat slow so I needed to create several short-cuts; for example, use meta-references into a buffer/cache instead of keeping separate segment copies, limit tag collections, etc. Handling HTML with syntactical errors required additional techniques (stack recovery rules and precedences). Final results were acceptable but there is still room for improvement.

Philibuster