haikson/sitemap-generator

网站地图爬虫和生成器类

安装: 0

依赖: 0

建议者: 0

安全: 0

星标: 36

关注者: 5

分支: 18

开放问题: 3

语言:Python

dev-master 2018-11-22 19:04 UTC

This package is not auto-updated.

Last update: 2020-01-10 15:16:35 UTC


README

网站地图生成器

安装

pip install sitemap-generator

Gevent

Sitemap-generator 使用 gevent 来实现多进程。安装 gevent

pip install gevent

示例

import pysitemap


if __name__ == '__main__':
    url = 'http://www.example.com/'  # url from to crawl
    logfile = 'errlog.log'  # path to logfile
    oformat = 'xml'  # output format
    crawl = pysitemap.Crawler(url=url, logfile=logfile, oformat=oformat)
    crawl.crawl()

多进程示例

import pysitemap


if __name__ == '__main__':
    url = 'http://www.example.com/'  # url from to crawl
    logfile = 'errlog.log'  # path to logfile
    oformat = 'xml'  # output format
    crawl = pysitemap.Crawler(url=url, logfile=logfile, oformat=oformat)
    crawl.crawl(pool_size=10)  # 10 parsing processes