README

此存储库包含一个JSON文件，其中列出了机器人、爬虫和蜘蛛使用的HTTP用户代理。

NPM包：[https://npmjs.net.cn/package/crawler-user-agents](https://npmjs.net.cn/package/crawler-user-agents)
Go包：[https://pkg.go.dev/github.com/monperrus/crawler-user-agents](https://pkg.go.dev/github.com/monperrus/crawler-user-agents)
PyPi包：[https://pypi.ac.cn/project/crawler-user-agents/](https://pypi.ac.cn/project/crawler-user-agents/)

每个pattern都是一个正则表达式。它应该与您喜欢的正则表达式库一起正常工作。

如果您在商业产品中使用此项目，请赞助它。

安装

直接下载

从该存储库直接下载crawler-user-agents.json文件。

JavaScript

crawler-user-agents已部署在npmjs.com：[https://npmjs.net.cn/package/crawler-user-agents](https://npmjs.net.cn/package/crawler-user-agents)

要使用它，请使用npm或yarn

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

在Node.js中，您可以使用require包来获取爬虫用户代理数组。

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Python

使用pip install crawler-user-agents安装

然后

import crawleruseragents
if crawleruseragents.is_crawler("Googlebot/"):
   # do something

或

import crawleruseragents
indices = crawleruseragents.matching_crawlers("bingbot/2.0")
print("crawlers' indices:", indices)
print(
    "crawler's URL:",
    crawleruseragents.CRAWLER_USER_AGENTS_DATA[indices[0]]["url"]
)

注意，如果给定的User-Agent确实匹配任何爬虫，则matching_crawlers比is_crawler慢得多。

Go

Go：使用此包，它提供全局变量Crawlers（它与crawler-user-agents.json同步），函数IsCrawler和MatchingCrawlers。

Go程序示例

package main

import (
	"fmt"

	"github.com/monperrus/crawler-user-agents"
)

func main() {
	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)
	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)
	fmt.Println("crawlers' indices:", indices)
	fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL)
}

输出

isCrawler: true
crawlers' indices: [237]
crawler' URL: https://discordapp.com

贡献

我欢迎作为拉取请求的贡献。

拉取请求应该

只包含一个添加项
指定一个相关的语法片段（例如 "totobot" 而不是 "Mozilla/5 totobot v20131212.alpha1"）
包含模式（通用正则表达式）、发现日期（年/月/日）和机器人的官方网址
结果是一个有效的JSON文件（别忘了项目之间的逗号）

示例

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

许可

此列表受MIT许可的保护。2016年11月7日之前的版本受CC-SA许可的保护。

monperrus / crawler-user-agents

维护者

详细信息

README

安装

直接下载

JavaScript

Python

Go

贡献

许可

相关工作