README

CLI Seo Pocket Crawler

用于检查一些SEO基础功能的网页爬虫。

使用收集到的数据在您喜欢的电子表格软件中，或者通过您喜欢的语言检索它们。

有法语文档可供使用：https://piedweb.com/seo/crawler

安装

$ composer create-project piedweb/crawler

使用方法

Crawler CLI

$ bin/console crawler:go $start

参数

  start                            Define where the crawl start. Eg: https://piedweb.com
                                   You can specify an id from a previous crawl. Other options will not be listen.
                                   You can use `last` to continue the last crawl (just stopped)

选项

  -l, --limit=LIMIT                Define where a depth limit [default: 5]
  -i, --ignore=IGNORE              Virtual Robots.txt to respect (could be a string or an URL).
  -u, --user-agent=USER-AGENT      Define the user-agent used during the crawl. [default: "SEO Pocket Crawler - PiedWeb.com/seo/crawler"]
  -w, --wait=WAIT                  In Microseconds, the time to wait between 2 requests. Default 0,1s. [default: 100000]
  -c, --cache-method=CACHE-METHOD  In Microseconds, the time to wait between two request. Default : 100000 (0,1s). [default: 2]
  -r, --restart=RESTART            Permit to restart a previous crawl. Values 1 = fresh restart, 2 = restart from cache
  -h, --help                       Display this help message
  -q, --quiet                      Do not output any message
  -V, --version                    Display this application version
      --ansi                       Force ANSI output
      --no-ansi                    Disable ANSI output
  -n, --no-interaction             Do not ask any interactive question
  -v|vv|vvv, --verbose             Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

从之前爬取中提取1秒内所有外部链接

$ bin/console crawler:external $id [--host]

    --id
        id from a previous crawl
        You can use  `last` too show external links from the last crawl.

    --host -ho
        flag permitting to get only host

计算页面排名

将更新之前生成的 data.csv。然后您可以使用PoC pagerank.html（在服务器 npx http-server -c-1 --port 3000）来探索您的网站。

$ bin/console crawler:pagerank $id

    --id
        id from a previous crawl
        You can use `last` too calcul page rank from the last crawl.

测试

$ composer test

待办事项

更好的链接收集和记录（记录上下文（列表、导航、句子...））
转换PoC（页面排名可视化器）
复杂的页面排名计算器（包含301、规范、nofollow等）

贡献

请参阅贡献指南

致谢

PiedWeb ak Robind4
所有贡献者

许可证

MIT许可证（MIT）。请参阅许可证文件了解更多信息。

piedweb / crawler

维护者

详细信息

README

CLI Seo Pocket Crawler

安装

使用方法

Crawler CLI

参数

选项

从之前爬取中提取1秒内所有外部链接

计算页面排名

测试

待办事项

贡献

致谢

许可证