denis-kisel / casper-curl
A phantomjsCURL用于获取难以访问网站的页面内容
Requires
- php: >=7.2
- ext-json: *
- rap2hpoutre/similar-text-finder: ^1.1
README
关于casperjs和phantomjs库的基本知识,用于获取难以访问网站的页面内容。
安装
1 安装全局的 casperjs 和 phantomjs
npm install -g casperjs
npm install -g phantomjs
# If phantomjs is running with errors
npm install -g phantomjs --ignore-scripts
2 安装 CasperCURL 包
composer require denis-kisel/casper-curl
发布配置文件(如果使用 Laravel)
如果你使用其他框架或原生 PHP,请跳过此设置。
php artisan vendor:publish --provider="DenisKisel\CasperCURL\ServiceProvider" --tag="config"
用法
简单示例
//Return content page $casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com')->request()
设置方法
method($method)
可用方法:GET|POST|PUT|DELETE
默认使用 GET
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->method('POST') ->request()
设置数据
withData($arrayData)
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->withData([ 'login' => '***', 'pass' => '***' ]) ->method('POST') ->request()
设置头部信息
withHeaders($arrayHeaders)
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->withHeaders([ 'User-Agent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0', 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' ]) ->request()
设置用户代理
userAgent($userAgent)
默认使用:Mozilla/5.0 (Windows NT 10.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->userAgent('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0') ->request()
使用代理
withProxy($ip, $port [, $method = 'http'] [, $login = null] [, $pass = null])
可用方法:http|socks5|none
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->withProxy($ip, $port) ->request()
使用Cookies
withCookie($fileName, [, $dir])
默认禁用Cookies。
默认情况下,Cookies 文件存储在 storage 目录。
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->withCookie('cookie.txt') ->request()
使用窗口大小(视口大小)
windowSize($with, $height)
默认:宽度/高度:1920/1080
px
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->windowSize(320, 600) ->request()
Phantom Cli 选项
设置自定义的 phantom cli 选项
可用选项列表:Phantom 选项文档
withPhantomOptions($arrayOptions)
选项的键名不能包含前缀 --
$options = [ 'debug' => 'true', 'ignore-ssl-errors' => 'true' ]; $casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('https://google.com') ->withPhantomOptions($options) ->request()
CasperJS
用于动态处理内容
Casper 文档
使用 Casper Then
casperThen($jsScript)
文档
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('http://google.fr') ->casperThen(' this.fill('form[action="/search"]', { q: 'casperjs' }, true); this.wait(2000, function () { this.capture('step_1.png'); }); ') ->request()
使用自定义 Casper JS
自定义 casper body js
文档
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('http://google.fr') ->customCasper(' casper.then(function() { this.fill('form[action="/search"]', { q: 'casperjs' }, true); this.wait(2000, function () { this.capture('step_1.png'); }); }); ') ->request()
调试
enableDebug()
将存储响应数据并捕获在 storage 目录中
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('http://google.com') ->enableDebug() ->request()
响应
响应是包含字段的对象
- 状态(例如:200|404|500)
- 内容(字符串 html|dom|txt)
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('http://google.com') ->request(); $response->status; $response->content;
响应内容
默认请求整个页面的内容
文档
但响应可以由 output
变量覆盖
$casperCURL = new \DenisKisel\CasperCURL\CasperCURL($storageDir); $response = $casperCURL->to('http://google.fr') ->casperThen(' this.fill('form[action="/search"]', { q: 'casperjs' }, true); this.wait(2000, function () { this.capture('step_1.png'); }); output = console.log('Override default output!'); ') ->request()
在 Laravel 中使用
$response = \DenisKisel\CasperCURL\LCasperCURL::to('https://google.com')->request()
许可证
此软件包是开源软件,许可协议为MIT 许可证
联系信息
开发者:Denis Kisel
- Email:denis.kisel92@gmail.com
- Skype:live:denis.kisel92