denis-kisel/casper-curl

A phantomjsCURL用于获取难以访问网站的页面内容

v0.2.4 2019-11-08 09:10 UTC

This package is auto-updated.

Last update: 2024-08-29 05:07:11 UTC


README

关于casperjsphantomjs库的基本知识,用于获取难以访问网站的页面内容。

安装

1 安装全局的 casperjs 和 phantomjs

npm install -g casperjs
npm install -g phantomjs

# If phantomjs is running with errors
npm install -g phantomjs --ignore-scripts

2 安装 CasperCURL 包

composer require denis-kisel/casper-curl

发布配置文件(如果使用 Laravel)

如果你使用其他框架或原生 PHP,请跳过此设置。

php artisan vendor:publish --provider="DenisKisel\CasperCURL\ServiceProvider" --tag="config"

用法

简单示例

//Return content page
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')->request()

设置方法

method($method)
可用方法:GET|POST|PUT|DELETE
默认使用 GET

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->method('POST')
    ->request()

设置数据

withData($arrayData)

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->withData([
        'login' => '***',
        'pass' => '***'
    ])
    ->method('POST')
    ->request()

设置头部信息

withHeaders($arrayHeaders)

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->withHeaders([
        'User-Agent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
        'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    ])
    ->request()

设置用户代理

userAgent($userAgent)
默认使用:Mozilla/5.0 (Windows NT 10.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->userAgent('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0')
    ->request()

使用代理

withProxy($ip, $port [, $method = 'http'] [, $login = null] [, $pass = null])

可用方法:http|socks5|none

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->withProxy($ip, $port)
    ->request()

使用Cookies

withCookie($fileName, [, $dir])
默认禁用Cookies。
默认情况下,Cookies 文件存储在 storage 目录。

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->withCookie('cookie.txt')
    ->request()

使用窗口大小(视口大小)

windowSize($with, $height)
默认:宽度/高度:1920/1080 px

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->windowSize(320, 600)
    ->request()

Phantom Cli 选项

设置自定义的 phantom cli 选项
可用选项列表:Phantom 选项文档

withPhantomOptions($arrayOptions)
选项的键名不能包含前缀 --

$options = [
    'debug' => 'true',
    'ignore-ssl-errors' => 'true'
];

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('https://google.com')
    ->withPhantomOptions($options)
    ->request()

CasperJS

用于动态处理内容
Casper 文档

使用 Casper Then

casperThen($jsScript)
文档

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('http://google.fr')
    ->casperThen('
         this.fill('form[action="/search"]', { q: 'casperjs' }, true);
         this.wait(2000, function () {
             this.capture('step_1.png');
         });
     
    ')
    ->request()

使用自定义 Casper JS

自定义 casper body js
文档

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('http://google.fr')
    ->customCasper('
        casper.then(function() {
             this.fill('form[action="/search"]', { q: 'casperjs' }, true);
             this.wait(2000, function () {
                 this.capture('step_1.png');
             });
        });
    ')
    ->request()

调试

enableDebug()
将存储响应数据并捕获在 storage 目录中

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('http://google.com')
    ->enableDebug()
    ->request()

响应

响应是包含字段的对象

  • 状态(例如:200|404|500)
  • 内容(字符串 html|dom|txt)
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('http://google.com')
    ->request();
    
$response->status;
$response->content;

响应内容

默认请求整个页面的内容
文档

但响应可以由 output 变量覆盖

$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir);
$response = $casperCURL->to('http://google.fr')
    ->casperThen('
         this.fill('form[action="/search"]', { q: 'casperjs' }, true);
         this.wait(2000, function () {
             this.capture('step_1.png');
         });
         
         output = console.log('Override default output!');
    ')
    ->request()

在 Laravel 中使用

$response = \DenisKisel\CasperCURL\LCasperCURL::to('https://google.com')->request()

许可证

此软件包是开源软件,许可协议为MIT 许可证

联系信息

开发者:Denis Kisel