screen scraping in php 5
anyone? which is the best approach? I want to load another sites webpage and chop it up into little pieces using php 4.
anyone done this?
The DOM functions of PHP5 have a function for loading HTML -- I don't know how well it works but that's where I would start.
Using preg_match_all() is also a great function for parsing tasks. You can pass it a regexp of all the tokens you want to match and then iterate the result.
As for actually getting the HTML from a site, that's pretty easy. You can just use fopen() with a URL, or the CURL extension, or I'm sure there's yet another way.
Wayne
July 20th, 2007 1:02am
I speak jive! err, I mean I can screenscrape. I haven't done it with PHP4, but I've done it in perl and .net.
.net is different from Perl, it fits the results into objects nicely. In perl, the whole file could be read into a string, and then globally parsed by one or more regexps.
Either way, your main focus is how to capture which fields. The more experience you get, the more refined your abilities will become at parsing very specifically.
In my experience, all tools do is package things up nicely after you have done the dirty work of setting up the parsing. Incidentally, the dirty work gets remarkable easier over time.