purc/snoopy

Snoopy - PHP 网络客户端

安装: 2

依赖: 0

建议者: 0

安全性: 0

星星: 0

关注者: 2

分支: 0

开放问题: 0

类型:php-net-client

dev-master 2023-03-16 11:00 UTC

This package is auto-updated.

Last update: 2024-09-16 14:27:36 UTC


README

Snoopy - PHP 网络客户端 v2.0.0 (适配 PHP 8.2)

概述

$snoopy = new \Purc\Snoopy\Snoopy();

$snoopy->fetchtext('https://www.google.com/');
echo 'fetchtext: ' . print_r($snoopy->results, true);

$snoopy->fetchlinks('https://www.phpbuilder.com/');
echo 'fetchlinks: ' . print_r($snoopy->results, true);

$url = 'https://php.ac.cn/search.php';
$vars = [
    'show' => 'quickref',
    'pattern' => 'PHP',
];
$snoopy->submit($url, $vars);
echo 'submit: ' . print_r($snoopy->results, true);

$snoopy->fetchform('https://www.altavista.com');
echo 'fetchform: ' . print_r($snoopy->results, true);

描述

什么是 Snoopy?

Snoopy 是一个 PHP 类,模拟网页浏览器。它可以自动化获取网页内容、提交表单等任务。

Snoopy 的一些特性

  • 轻松获取网页内容
  • 轻松从网页中获取文本(去除 HTML 标签)
  • 轻松从网页中获取链接
  • 支持代理主机
  • 支持基本用户/密码认证
  • 支持设置用户代理、引用、cookies 和头部内容
  • 支持浏览器重定向,并控制重定向深度
  • 将获取的链接扩展到完全限定的 URL(默认)
  • 轻松提交表单数据并获取结果
  • 支持跟随 HTML 框架(v0.92 添加)
  • 支持在重定向时传递 cookies(v0.92 添加)

要求

Snoopy 需要带有 PCRE(Perl 兼容正则表达式)的 PHP,以及 OpenSSL 扩展来获取 HTTPS 请求。

类方法

fetch($uri)

这是用于获取网页内容的方法。 $uri 是要获取内容的页面的完全限定 URL。获取的结果存储在 $this->results 中。如果你正在获取框架,则 $this->results 包含每个获取到的框架,以数组形式。

fetchtext($uri)

这与 fetch() 的行为完全相同,但它只返回页面中的文本,去除 HTML 标签和其他无关数据。

fetchform($uri)

这与 fetch() 的行为完全相同,但它只返回页面中的表单元素,去除 HTML 标签和其他无关数据。

fetchlinks($uri)

这与 fetch() 的行为完全相同,但它只返回页面中的链接。默认情况下,相对链接会被转换为它们的完全限定 URL 形式。

submit($uri, $formVars)

这会将表单提交到指定的 $uri$formVars 是要传递的表单变量的数组。

submittext($uri, $formVars)

这与 submit() 的行为完全相同,但它只返回页面中的文本,去除 HTML 标签和其他无关数据。

submitlinks($uri)

这与 submit() 的行为完全相同,但它只返回页面中的链接。默认情况下,相对链接会被转换为它们的完全限定 URL 形式。

类变量:(括号中为默认值)

$host             the host to connect to
$port             the port to connect to
$proxy_host       the proxy host to use, if any
$proxy_port       the proxy port to use, if any
                  proxy can only be used for http URLs, but not https
$agent            the user agent to masqerade as (Snoopy v0.1)
$referer          referer information to pass, if any
$cookies          cookies to pass if any
$rawheaders       other header info to pass, if any
$maxredirs        maximum redirects to allow. 0=none allowed. (5)
$offsiteok        whether or not to allow redirects off-site. (true)
$expandlinks      whether or not to expand links to fully qualified URLs (true)
$user             authentication username, if any
$pass             authentication password, if any
$accept           http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*)
$error            where errors are sent, if any
$response_code    responde code returned from server
$headers          headers returned from server
$maxlength        max return data length
$read_timeout     timeout on read operations (requires PHP 4 Beta 4+)
                  set to 0 to disallow timeouts
$timed_out        true if a read operation timed out (requires PHP 4 Beta 4+)
$maxframes        number of frames we will follow
$status           http status of fetch
$temp_dir         temp directory that the webserver can write to. (/tmp)
$curl_path        system path to cURL binary, set to false if none
                  (this variable is ignored as of Snoopy v1.2.6)
$cafile           name of a file with CA certificate(s)
$capath           name of a correctly hashed directory with CA certificate(s)
                  if either $cafile or $capath is set, SSL certificate
                  verification is enabled

示例

示例:获取网页并显示返回的头部和页面内容(HTML 转义)

$snoopy = new \Purc\Snoopy\Snoopy();

$snoopy->user = "joe";
$snoopy->pass = "bloe";

if ($snoopy->fetch("https://www.slashdot.org/"))
{
    echo "response code: " . $snoopy->response_code . "<br>\n";
    while (list($key, $val) = each($snoopy->headers))
    {
        echo $key . ": " . $val . "<br>\n";
    }
    echo "<p>\n";

    echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n";
}
else
{
    echo "error fetching document: " . $snoopy->error . "\n";
}

示例:提交表单并打印出结果头部和 HTML 转义的页面

$snoopy = new \Purc\Snoopy\Snoopy();

$submit_url = "https://lnk.ispi.net/texis/scripts/msearch/netsearch.html";

$submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista";

if ($snoopy->submit($submit_url, $submit_vars))
{
    while (list($key, $val) = each($snoopy->headers))
    {
        echo $key . ": " . $val . "<br>\n";
    }
    echo "<p>\n";

    echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n";
}
else
{
    echo "error fetching document: " . $snoopy->error . "\n";
}

示例:展示所有变量的功能

$snoopy = new \Purc\Snoopy\Snoopy();

$snoopy->proxy_host = "my.proxy.host";
$snoopy->proxy_port = "8080";

$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
$snoopy->referer = "https://www.microsnot.com/";

$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["favoriteColor"] = "RED";

$snoopy->rawheaders["Pragma"] = "no-cache";

$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;

$snoopy->user = "joe";
$snoopy->pass = "bloe";

if ($snoopy->fetchtext("https://www.phpbuilder.com"))
{
    while (list($key, $val) = each($snoopy->headers))
    {
        echo $key . ": " . $val . "<br>\n";
    }
    echo "<p>\n";

    echo "<pre>" . htmlspecialchars($snoopy->results) . "</pre>\n";
}
else
{
    echo "error fetching document: " . $snoopy->error . "\n";
}

示例:获取框架内容并显示结果

$snoopy = new \Purc\Snoopy\Snoopy();

$snoopy->maxframes = 5;

if ($snoopy->fetch("https://www.ispi.net/"))
{
    echo "<pre>" . htmlspecialchars($snoopy->results[0]) . "</pre>\n";
    echo "<pre>" . htmlspecialchars($snoopy->results[1]) . "</pre>\n";
    echo "<pre>" . htmlspecialchars($snoopy->results[2]) . "</pre>\n";
}
else
{
    echo "error fetching document: " . $snoopy->error . "\n";
}

版权

版权(c) 1999,2000 ispi。保留所有权利。本软件在 GNU 通用公共许可证下发布。请阅读 Snoopy.class.php 文件顶部的免责声明。

感谢

特别感谢