imarc/crawler

抓取网站并获取所有链接

0.2.0 2017-11-01 19:57 UTC

This package is not auto-updated.

Last update: 2024-09-20 08:00:47 UTC


README

抓取网站并对URL进行操作。默认情况下,它将在txt文件中每行输出一个URL。

这可以很容易地扩展以执行许多其他操作。您只需创建一个新的Observer(src/Observer)。

安装

composer require imarc/crawler

用法

在您的项目目录中: ./vendor/bin/crawler csv URL 目标

从仓库中: ./crawler.php csv URL 目标

选项

crawler --help

Usage:
  csv [options] [--] <url> <destination>

Arguments:
  url                    URL to crawl.
  destination            Write CSV to FILE

Options:
  -s, --show-progress    Show the crawl's progress
  -e, --crawl-external   Crawl external URLs
  -q, --quiet            Do not output any message
      --exclude=EXCLUDE  Exclude certain extensions [default: ["css","gif","ico","jpg","jpg","js","pdf","pdf","png","rss","txt"]] (multiple values allowed)
  -h, --help             Display this help message
  -V, --version          Display this application version
      --ansi             Force ANSI output
      --no-ansi          Disable ANSI output
  -n, --no-interaction   Do not ask any interactive question
  -v|vv|vvv, --verbose   Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

测试

codecept run