purc / snoopy
Snoopy - PHP 网络客户端
Requires
- ext-zlib: *
This package is auto-updated.
Last update: 2024-09-16 14:27:36 UTC
README
Snoopy - PHP 网络客户端 v2.0.0 (适配 PHP 8.2)
概述
$snoopy = new \Purc\Snoopy\Snoopy(); $snoopy->fetchtext('https://www.google.com/'); echo 'fetchtext: ' . print_r($snoopy->results, true); $snoopy->fetchlinks('https://www.phpbuilder.com/'); echo 'fetchlinks: ' . print_r($snoopy->results, true); $url = 'https://php.ac.cn/search.php'; $vars = [ 'show' => 'quickref', 'pattern' => 'PHP', ]; $snoopy->submit($url, $vars); echo 'submit: ' . print_r($snoopy->results, true); $snoopy->fetchform('https://www.altavista.com'); echo 'fetchform: ' . print_r($snoopy->results, true);
描述
什么是 Snoopy?
Snoopy 是一个 PHP 类,模拟网页浏览器。它可以自动化获取网页内容、提交表单等任务。
Snoopy 的一些特性
- 轻松获取网页内容
- 轻松从网页中获取文本(去除 HTML 标签)
- 轻松从网页中获取链接
- 支持代理主机
- 支持基本用户/密码认证
- 支持设置用户代理、引用、cookies 和头部内容
- 支持浏览器重定向,并控制重定向深度
- 将获取的链接扩展到完全限定的 URL(默认)
- 轻松提交表单数据并获取结果
- 支持跟随 HTML 框架(v0.92 添加)
- 支持在重定向时传递 cookies(v0.92 添加)
要求
Snoopy 需要带有 PCRE(Perl 兼容正则表达式)的 PHP,以及 OpenSSL 扩展来获取 HTTPS 请求。
类方法
fetch($uri)
这是用于获取网页内容的方法。 $uri
是要获取内容的页面的完全限定 URL。获取的结果存储在 $this->results 中。如果你正在获取框架,则 $this->results 包含每个获取到的框架,以数组形式。
fetchtext($uri)
这与 fetch()
的行为完全相同,但它只返回页面中的文本,去除 HTML 标签和其他无关数据。
fetchform($uri)
这与 fetch()
的行为完全相同,但它只返回页面中的表单元素,去除 HTML 标签和其他无关数据。
fetchlinks($uri)
这与 fetch()
的行为完全相同,但它只返回页面中的链接。默认情况下,相对链接会被转换为它们的完全限定 URL 形式。
submit($uri, $formVars)
这会将表单提交到指定的 $uri
。 $formVars
是要传递的表单变量的数组。
submittext($uri, $formVars)
这与 submit()
的行为完全相同,但它只返回页面中的文本,去除 HTML 标签和其他无关数据。
submitlinks($uri)
这与 submit()
的行为完全相同,但它只返回页面中的链接。默认情况下,相对链接会被转换为它们的完全限定 URL 形式。
类变量:(括号中为默认值)
$host the host to connect to $port the port to connect to $proxy_host the proxy host to use, if any $proxy_port the proxy port to use, if any proxy can only be used for http URLs, but not https $agent the user agent to masqerade as (Snoopy v0.1) $referer referer information to pass, if any $cookies cookies to pass if any $rawheaders other header info to pass, if any $maxredirs maximum redirects to allow. 0=none allowed. (5) $offsiteok whether or not to allow redirects off-site. (true) $expandlinks whether or not to expand links to fully qualified URLs (true) $user authentication username, if any $pass authentication password, if any $accept http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*) $error where errors are sent, if any $response_code responde code returned from server $headers headers returned from server $maxlength max return data length $read_timeout timeout on read operations (requires PHP 4 Beta 4+) set to 0 to disallow timeouts $timed_out true if a read operation timed out (requires PHP 4 Beta 4+) $maxframes number of frames we will follow $status http status of fetch $temp_dir temp directory that the webserver can write to. (/tmp) $curl_path system path to cURL binary, set to false if none (this variable is ignored as of Snoopy v1.2.6) $cafile name of a file with CA certificate(s) $capath name of a correctly hashed directory with CA certificate(s) if either $cafile or $capath is set, SSL certificate verification is enabled
示例
示例:获取网页并显示返回的头部和页面内容(HTML 转义)
$snoopy = new \Purc\Snoopy\Snoopy(); $snoopy->user = "joe"; $snoopy->pass = "bloe"; if ($snoopy->fetch("https://www.slashdot.org/")) { echo "response code: " . $snoopy->response_code . "<br>\n"; while (list($key, $val) = each($snoopy->headers)) { echo $key . ": " . $val . "<br>\n"; } echo "<p>\n"; echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n"; } else { echo "error fetching document: " . $snoopy->error . "\n"; }
示例:提交表单并打印出结果头部和 HTML 转义的页面
$snoopy = new \Purc\Snoopy\Snoopy(); $submit_url = "https://lnk.ispi.net/texis/scripts/msearch/netsearch.html"; $submit_vars["q"] = "amiga"; $submit_vars["submit"] = "Search!"; $submit_vars["searchhost"] = "Altavista"; if ($snoopy->submit($submit_url, $submit_vars)) { while (list($key, $val) = each($snoopy->headers)) { echo $key . ": " . $val . "<br>\n"; } echo "<p>\n"; echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n"; } else { echo "error fetching document: " . $snoopy->error . "\n"; }
示例:展示所有变量的功能
$snoopy = new \Purc\Snoopy\Snoopy(); $snoopy->proxy_host = "my.proxy.host"; $snoopy->proxy_port = "8080"; $snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; $snoopy->referer = "https://www.microsnot.com/"; $snoopy->cookies["SessionID"] = 238472834723489l; $snoopy->cookies["favoriteColor"] = "RED"; $snoopy->rawheaders["Pragma"] = "no-cache"; $snoopy->maxredirs = 2; $snoopy->offsiteok = false; $snoopy->expandlinks = false; $snoopy->user = "joe"; $snoopy->pass = "bloe"; if ($snoopy->fetchtext("https://www.phpbuilder.com")) { while (list($key, $val) = each($snoopy->headers)) { echo $key . ": " . $val . "<br>\n"; } echo "<p>\n"; echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n"; } else { echo "error fetching document: " . $snoopy->error . "\n"; }
示例:获取框架内容并显示结果
$snoopy = new \Purc\Snoopy\Snoopy(); $snoopy->maxframes = 5; if ($snoopy->fetch("https://www.ispi.net/")) { echo "<pre>" . htmlspecialchars($snoopy->results[0]) . "</pre>\n"; echo "<pre>" . htmlspecialchars($snoopy->results[1]) . "</pre>\n"; echo "<pre>" . htmlspecialchars($snoopy->results[2]) . "</pre>\n"; } else { echo "error fetching document: " . $snoopy->error . "\n"; }
版权
版权(c) 1999,2000 ispi。保留所有权利。本软件在 GNU 通用公共许可证下发布。请阅读 Snoopy.class.php 文件顶部的免责声明。
感谢
特别感谢
- Peter Sorger sorgo@cool.sk 帮助修复重定向错误
- Andrei Zmievski andrei@ispi.net 实现超时功能
- 帕特里克·桑德林 patric@kajen.com 协助调试 fetchform
- 卡梅洛 carmelo@meltingsoft.com 处理与框架相关的杂项错误