--

screen scraping in php 5

anyone?  which is the best approach?  I want to load another sites webpage and chop it up into little pieces using php 4.


anyone done this?
Permalink worldsSmallestViolin 
July 20th, 2007 12:32am
The DOM functions of PHP5 have a function for loading HTML -- I don't know how well it works but that's where I would start. 

Using preg_match_all() is also a great function for parsing tasks.  You can pass it a regexp of all the tokens you want to match and then iterate the result.

As for actually getting the HTML from a site, that's pretty easy.  You can just use fopen() with a URL, or the CURL extension, or I'm sure there's yet another way.
Permalink Send private email Wayne 
July 20th, 2007 1:02am
I speak jive!  err, I mean I can screenscrape.  I haven't done it with PHP4, but I've done it in perl and .net.
Permalink Send private email LinuxOrBust 
July 20th, 2007 3:23am
.net is different from Perl, it fits the results into objects nicely.  In perl, the whole file could be read into a string, and then globally parsed by one or more regexps. 

Either way, your main focus is how to capture which fields.  The more experience you get, the more refined your abilities will become at parsing very specifically.
Permalink Send private email LinuxOrBust 
July 20th, 2007 3:29am
In my experience, all tools do is package things up nicely after you have done the dirty work of setting up the parsing.  Incidentally, the dirty work gets remarkable easier over time.
Permalink Send private email LinuxOrBust 
July 20th, 2007 10:12am

This topic is archived. No further replies will be accepted.

Other topics: July, 2007 Other topics: July, 2007 Recent topics Recent topics