geol/snoopy

使用 Snoopy 库 >= PHP 8

v2.0.1 2023-04-19 08:56 UTC

This package is auto-updated.

Last update: 2024-09-19 11:47:23 UTC


README

名称

Snoopy - the PHP net client v2.0.1

摘要

include "Snoopy.class.php";
$snoopy = new Snoopy;

$snoopy->fetchtext("https://php.ac.cn/");
print $snoopy->results;

$snoopy->fetchlinks("http://www.phpbuilder.com/");
print $snoopy->results;

$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";

$submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista";
	
$snoopy->submit($submit_url,$submit_vars);
print $snoopy->results;

$snoopy->maxframes=5;
$snoopy->fetch("http://www.ispi.net/");
echo "<PRE>\n";
echo htmlentities($snoopy->results[0]); 
echo htmlentities($snoopy->results[1]); 
echo htmlentities($snoopy->results[2]); 
echo "</PRE>\n";

$snoopy->fetchform("http://www.altavista.com");
print $snoopy->results;

描述

What is Snoopy?

Snoopy is a PHP class that simulates a web browser. It automates the
task of retrieving web page content and posting forms, for example.

Some of Snoopy's features:

* easily fetch the contents of a web page
* easily fetch the text from a web page (strip html tags)
* easily fetch the the links from a web page
* supports proxy hosts
* supports basic user/pass authentication
* supports setting user_agent, referer, cookies and header content
* supports browser redirects, and controlled depth of redirects
* expands fetched links to fully qualified URLs (default)
* easily submit form data and retrieve the results
* supports following html frames (added v0.92)
* supports passing cookies on redirects (added v0.92)

需求

Snoopy requires PHP with PCRE (Perl Compatible Regular Expressions),
which should be PHP 3.0.9 and up. For read timeout support, it requires
PHP 4 Beta 4 or later. Snoopy was developed and tested with PHP 3.0.12.

类方法

fetch($URI)
-----------

This is the method used for fetching the contents of a web page.
$URI is the fully qualified URL of the page to fetch.
The results of the fetch are stored in $this->results.
If you are fetching frames, then $this->results
contains each frame fetched in an array.
	
fetchtext($URI)
---------------	

This behaves exactly like fetch() except that it only returns
the text from the page, stripping out html tags and other
irrelevant data.		

fetchform($URI)
---------------	

This behaves exactly like fetch() except that it only returns
the form elements from the page, stripping out html tags and other
irrelevant data.		

fetchlinks($URI)
----------------

This behaves exactly like fetch() except that it only returns
the links from the page. By default, relative links are
converted to their fully qualified URL form.

submit($URI,$formvars)
----------------------

This submits a form to the specified $URI. $formvars is an
array of the form variables to pass.
	
	
submittext($URI,$formvars)
--------------------------

This behaves exactly like submit() except that it only returns
the text from the page, stripping out html tags and other
irrelevant data.		

submitlinks($URI)
----------------

This behaves exactly like submit() except that it only returns
the links from the page. By default, relative links are
converted to their fully qualified URL form.

类变量:(括号内为默认值)

$host			the host to connect to
$port			the port to connect to
$proxy_host		the proxy host to use, if any
$proxy_port		the proxy port to use, if any
$agent			the user agent to masqerade as (Snoopy v0.1)
$referer		referer information to pass, if any
$cookies		cookies to pass if any
$rawheaders		other header info to pass, if any
$maxredirs		maximum redirects to allow. 0=none allowed. (5)
$offsiteok		whether or not to allow redirects off-site. (true)
$expandlinks	whether or not to expand links to fully qualified URLs (true)
$user			authentication username, if any
$pass			authentication password, if any
$accept			http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*)
$error			where errors are sent, if any
$response_code	responde code returned from server
$headers		headers returned from server
$maxlength		max return data length
$read_timeout	timeout on read operations (requires PHP 4 Beta 4+)
				set to 0 to disallow timeouts
$timed_out		true if a read operation timed out (requires PHP 4 Beta 4+)
$maxframes		number of frames we will follow
$status			http status of fetch
$temp_dir		temp directory that the webserver can write to. (/tmp)
$curl_path		system path to cURL binary, set to false if none

示例

Example: 	fetch a web page and display the return headers and
			the contents of the page (html-escaped):

include "Snoopy.class.php";
$snoopy = new Snoopy;

$snoopy->user = "joe";
$snoopy->pass = "bloe";

if($snoopy->fetch("http://www.slashdot.org/"))
{
	echo "response code: ".$snoopy->response_code."<br>\n";
	while(list($key,$val) = each($snoopy->headers))
		echo $key.": ".$val."<br>\n";
	echo "<p>\n";
	
	echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
	echo "error fetching document: ".$snoopy->error."\n";



Example:	submit a form and print out the result headers
			and html-escaped page:

include "Snoopy.class.php";
$snoopy = new Snoopy;

$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";

$submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista";

	
if($snoopy->submit($submit_url,$submit_vars))
{
	while(list($key,$val) = each($snoopy->headers))
		echo $key.": ".$val."<br>\n";
	echo "<p>\n";
	
	echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
	echo "error fetching document: ".$snoopy->error."\n";



Example:	showing functionality of all the variables:


include "Snoopy.class.php";
$snoopy = new Snoopy;

$snoopy->proxy_host = "my.proxy.host";
$snoopy->proxy_port = "8080";

$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
$snoopy->referer = "http://www.microsnot.com/";

$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["favoriteColor"] = "RED";

$snoopy->rawheaders["Pragma"] = "no-cache";

$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;

$snoopy->user = "joe";
$snoopy->pass = "bloe";

if($snoopy->fetchtext("http://www.phpbuilder.com"))
{
	while(list($key,$val) = each($snoopy->headers))
		echo $key.": ".$val."<br>\n";
	echo "<p>\n";
	
	echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
	echo "error fetching document: ".$snoopy->error."\n";


Example: 	fetched framed content and display the results

include "Snoopy.class.php";
$snoopy = new Snoopy;

$snoopy->maxframes = 5;

if($snoopy->fetch("http://www.ispi.net/"))
{
	echo "<PRE>".htmlspecialchars($snoopy->results[0])."</PRE>\n";
	echo "<PRE>".htmlspecialchars($snoopy->results[1])."</PRE>\n";
	echo "<PRE>".htmlspecialchars($snoopy->results[2])."</PRE>\n";
}
else
	echo "error fetching document: ".$snoopy->error."\n";

Example:

# composer.json in Laravel
"files": [
	"vendor/geol/snoopy/Snoopy.class.php"
]

# ExampleClass.php
use geol\Snoopy\Snoopy as Snoopy;

class ExampleClass {
	function test {
		$url = 'https://www.example.com';
		$snoopy = new Snoopy();
		$snoopy->fetch($url);
	}
}

版权

Copyright(c) 1999-2023 ispi. All rights reserved.
This software is released under the GNU General Public License.
Please read the disclaimer at the top of the Snoopy.class.php file.

感谢:特别感谢

- Peter Sorger <sorgo@cool.sk> help fixing a redirect bug
- Andrei Zmievski <andrei@ispi.net> implementing time out functionality
- Patric Sandelin <patric@kajen.com> help with fetchform debugging
- Carmelo <carmelo@meltingsoft.com> misc bug fixes with frames
- Geol	<big9401@gmail.com> fixing bugs