README

一个非常简单的单页PHP爬虫类，利用cURL库抓取网页内容。可以使用GET或POST方法抓取网页，也可以使用表单POST方法从基于asp.net的网站抓取网页内容。
支持
1. GET方法
2. POST方法
3. ASP调用
4. 通过标记标签名称检索页面内容
5. 从表单字段中检索值

安装

composer require juyal-ahmed/web-scraper

获取完整网页内容

<?php
require 'vendor/autoload.php';

// Create a Scraper instance with only the URL specified
$scraper = new \PhpFarmer\WebScraper\Scraper('https://example.com');
$pageHtmlContent = $scraper->getPageContent('https://example.com/page.html');
?>

获取完整网页内容

<?php
require 'vendor/autoload.php';

// Create a Scraper instance with custom cache settings
$scraperWithCache = new Scraper('https://example.com', true, './custom_cache/', 600);
$pageHtmlContent = $scraper->getPageContent('https://example.com/page.html');
?>

使用代理IP获取完整网页内容

<?php
require 'vendor/autoload.php';

// Create a Scraper instance with only the URL specified
$scraper = new \PhpFarmer\WebScraper\Scraper('https://example.com');
$pageHtmlContent = $scraper->curl('https://example.com/page.html', "93.118.xx.141:8800", "6USERR:8PASS1");
?>

解析页面HTML内容

<?php
$subHtmlContent =  $scraper->getHtmlContentBetweenTags($pageHtmlContent, '', '');
?>

工作原理

在您的页面头部包含scraper.php类。
设置一些默认设置。
通过其现有方法获取页面内容。
如果需要单条内容，使用getHtmlContentBetweenTags方法分割内容。
如果需要网格数据，使用针（例如explode()）分割内容。
然后遍历整个内容，再次使用getHtmlContentBetweenTags获取内容，以创建网格数据的最终数组。
就这样

谢谢

tojibon / web-scraper

维护者

详细信息

README

安装

获取完整网页内容

获取完整网页内容

使用代理IP获取完整网页内容

解析页面HTML内容

工作原理