blanchonvincent / simple-page-crawler

ZF2 模块 v0.3.0 - 提供一个爬虫以获取网页信息：标题、元数据、标题标签和图片

此包的规范仓库似乎已不存在，因此已冻结此包。

维护者

详细信息

github.com/blanchonvincent/SimplePageCrawler

开放问题: 1

类型：模块

0.3.0 2013-01-22 13:19 UTC

Requires

php: >=5.3.3
zendframework/zendframework: 2.*

Requires (Dev)

None

Suggests

None

Provides

None

Conflicts

None

Replaces

None

MIT f79e7cc6f0bf2c21e74a8e5847cbe4270e5e6bca

Vincent Blanchon <blanchon.vincent.woop@gmail.com>

dev-master
0.3.0

This package is not auto-updated.

Last update: 2019-04-29 00:41:49 UTC

README

版本 0.3.0 由 Vincent Blanchon 创建

介绍

SimplePageCrawler 是一个网页爬虫。您可以获取以下信息

标题
元数据（描述、open graph 等）
H1、H2 等
图片列表
链接列表

用法

获取页面信息

$crawler = $this->getServiceLocator('SimplePageCrawler');
$page = $crawler->get('http://www.nytimes.com');

echo sprintf('The title is "%s"', $page->getTitle());
echo sprintf('The description is "%s"', $page->getMeta('description'));

您可以使用 action 辅助函数

$page = $this->simplePageCrawler('http://www.nytimes.com');

echo sprintf('The title is "%s"', $page->getTitle());
echo sprintf('The description is "%s"', $page->getMeta('description'));

高级用法

您可以获取 Open graph 元数据

$page = $this->simplePageCrawler('http://www.nytimes.com');
$metas = $page->getMeta()->getOpenGraph();