简化版 / spider
检索网站的必要信息
0.1.13
2015-11-11 22:06 UTC
Requires
- php: >=5.4
- simplon/request: 0.4.*
Requires (Dev)
- phpunit/phpunit: 4.8.*
README
/ \ \ \ ,, / / '-.`\()/`.-' .--_'( )'_--. / /` /`""`\ `\ \ | | >< | | \ \ / / '.__.' Simplon/Spider
简介
什么是简化版/spider?
Spider解析指定的HTML文档
并汇总所有必要数据
- 标题
- 描述
- 关键词
- 所有h1内容
- open-graph标签
- twitter标签
- 所有图片
它基本上提供了与Facebook抓取器相同类型的响应。然而,Facebook的抓取器不提供所有必要数据。
Facebook抓取器响应
{ "og_object":{ "id":"379786107965", "description":"Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN provides special reports, video, audio, photo galleries, and interactive guides", "title":"Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com", "type":"website", "updated_time":"2015-09-01T13:15:53+0000", "url":"http:\/\/www.cnn.com\/" }, "share":{ "comment_count":0, "share_count":1340555 }, "id":"http:\/\/cnn.com" }
Spider响应
{ "title":"Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com", "description":"Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN provides special reports, video, audio, photo galleries, and interactive guides", "keywords":"breaking news, news online, U.S. news, world news, developing story, news video, CNN news, weather, business, money, politics, law, technology, entertainment, education, travel, health, special reports, autos, CNN TV", "url": "http:\/\/www.cnn.com\/", "images":[ "http://i2.cdn.turner.com/cnnnext/dam/assets/150901143136-budapest-migrant-protest-fists-large-169.jpg", "http://i2.cdn.turner.com/cnnnext/dam/assets/110902115913-gates-of-auschwitz-large-169.jpg" ], "openGraph":{ "pubdate":"2014-02-24T14:45:54Z", "url":"http://www.cnn.com", "title":"Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com", "description":"Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN provides special reports, video, audio, photo galleries, and interactive guides", "site_name":"CNN", "type":"website" }, "twitter":{ "card":"summary_large_image" } }
有依赖项吗?
- PHP 5.4
- CURL
安装
通过composer轻松安装。还不知道composer是什么?请在这里了解更多信息。
{ "require": { "simplon/spider": "*" } }
示例
以下示例直接明了,无需额外解释。
通过获取页面来解析
use Simplon\Spider\Spider; // fetch and parse $data = Spider::fetchParse('http://cnn.com'); echo json_encode($data); // json encode result
通过现有HTML来解析
use Simplon\Spider\Spider; // page html $html = '...'; // fetch and parse $data = Spider::parse($html, 'http://cnn.com'); // URL is needed to rebuild absolute image paths echo json_encode($data); // json encode result
两种情况下的结果
{ "title":"Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com", "description":"Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN provides special reports, video, audio, photo galleries, and interactive guides", "keywords":"breaking news, news online, U.S. news, world news, developing story, news video, CNN news, weather, business, money, politics, law, technology, entertainment, education, travel, health, special reports, autos, CNN TV", "url": "http:\/\/www.cnn.com\/", "images":[ "http://i2.cdn.turner.com/cnnnext/dam/assets/150901143136-budapest-migrant-protest-fists-large-169.jpg", "http://i2.cdn.turner.com/cnnnext/dam/assets/110902115913-gates-of-auschwitz-large-169.jpg" ], "openGraph":{ "pubdate":"2014-02-24T14:45:54Z", "url":"http://www.cnn.com", "title":"Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com", "description":"Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more. For in-depth coverage, CNN provides special reports, video, audio, photo galleries, and interactive guides", "site_name":"CNN", "type":"website" }, "twitter":{ "card":"summary_large_image" } }
许可证
简化版/spider可以在MIT许可证的条款下自由分发。
版权(c)2015 Tino Ehrich (tino@bigpun.me)
特此授予任何获得此软件及其相关文档副本(“软件”)的人免费使用权,包括但不限于使用、复制、修改、合并、发布、分发、再许可和/或销售软件副本,并允许获得软件的人这样做,但受以下条件的约束:
上述版权声明和本许可声明应包含在软件的所有副本或主要部分中。
软件按“原样”提供,不提供任何明示或暗示的保证,包括但不限于适销性、特定用途适用性和非侵权性保证。在任何情况下,作者或版权持有人均不对任何索赔、损害或其他责任负责,无论是基于合同、侵权或其他方式,无论是因软件或其使用或其他方式而产生的。