bramus / mixed-content-scan
扫描您的启用了HTTPS的网站以查找混合内容
Requires
- php: >=5.4
- bramus/monolog-colored-line-formatter: ~2.0
- monolog/monolog: ~1.11
- vanilla/garden-cli: ~1.3
README
扫描您的启用了HTTPS的网站以查找混合内容
由Bramus构建!(https://www.bram.us/) 以及 贡献者
关于
Mixed Content Scan
是一个命令行脚本,可以爬取并扫描启用了HTTPS的网站以查找混合内容。
脚本从指定的URL开始,然后开始处理它
- 检查所有包含的
img[src|srcset|data-src]
、iframe[src]
、script[src]
、link[href][rel="stylesheet"]
、object[data]
、form[action]
、embed[src]
、video[src]
、audio[src]
、source[src|srcset]
和params[name="movie"][value]
元素是否为混合内容 - 所有指向相同或更深级别的
a[href]
元素都会连续处理以检查混合内容。
安装
可以使用 Composer 进行安装
composer global require bramus/mixed-content-scan:~2.9
初识Composer? 它是PHP依赖管理的命令行工具。在Linux/Unix/OSX上,您需要 下载并运行安装脚本 并 (推荐) 连续 将 composer.phar
移动到全局位置。在Windows上,您需要 运行安装程序
用法
从CLI运行此脚本,例如
$ mixed-content-scan https://www.bram.us/
脚本本身将开始扫描并在运行时提供反馈。当找到混合内容时,将显示在屏幕上导致混合内容警告的URL
$ mixed-content-scan https://www.bram.us/
[2015-01-07 12:54:20] MCS.NOTICE: Scanning https://www.bram.us/ [] []
[2015-01-07 12:54:21] MCS.INFO: 00000 - https://www.bram.us/ [] []
[2015-01-07 12:54:22] MCS.INFO: 00001 - https://www.bram.us/projects/ [] []
[2015-01-07 12:54:22] MCS.INFO: 00002 - https://www.bram.us/projects/mint-custom-title/ [] []
[2015-01-07 12:54:23] MCS.INFO: 00003 - https://www.bram.us/projects/bramusicq/ [] []
[2015-01-07 12:54:24] MCS.INFO: 00004 - https://www.bram.us/projects/gm_bramus/ [] []
[2015-01-07 12:54:24] MCS.INFO: 00005 - https://www.bram.us/projects/js_bramus/ [] []
[2015-01-07 12:54:26] MCS.INFO: 00006 - https://www.bram.us/projects/js_bramus/jsprogressbarhandler/ [] []
[2015-01-07 12:54:27] MCS.INFO: 00007 - https://www.bram.us/projects/js_bramus/lazierload/ [] []
[2015-01-07 12:54:27] MCS.INFO: 00008 - https://www.bram.us/projects/the-box-office/ [] []
[2015-01-07 12:54:28] MCS.INFO: 00009 - https://www.bram.us/projects/tinymce-plugins/ [] []
[2015-01-07 12:54:29] MCS.INFO: 00010 - https://www.bram.us/projects/tinymce-plugins/tinymce-classes-and-ids-plugin-bramus_cssextras/ [] []
[2015-01-07 12:54:30] MCS.INFO: 00011 - https://www.bram.us/projects/flashlightboxinjector/ [] []
...
[2015-01-07 12:54:45] MCS.INFO: 00036 - https://www.bram.us/2007/06/04/accessible-expanding-and-collapsing-menu/ [] []
[2015-01-07 12:54:45] MCS.ERROR: 00037 - https://www.bram.us/demo/projects/jsprogressbarhandler/ [] []
[2015-01-07 12:54:45] MCS.WARNING: https://#/urchin.js [] []
[2015-01-07 12:54:46] MCS.INFO: 00038 - https://www.bram.us/2008/07/11/ror-progress-bar-helper/ [] []
[2015-01-07 12:54:46] MCS.INFO: 00039 - https://www.bram.us/2008/11/10/jsprogressbarhandler-033/ [] []
[2015-01-07 12:54:47] MCS.ERROR: 00040 - https://www.bram.us/demo/projects/lazierload/ [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1212/1285026452_0aeb38b6e6.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1074/1273115418_a77357040a.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1096/1273106588_91f7a736c6.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1324/1216309045_31ca82f9d9.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1262/1217169586_e4b2bfa7df.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1149/1216304291_63fd48d9c4.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1366/1216301505_51b3c590ff.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1184/1216299847_c57975bed2.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1085/1217158084_a9b059d25b.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1040/1216293529_3b7c044815.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1029/1084232736_5b8c023f46.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1318/1043062251_17071a8cc7.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: http://farm2.static.flickr.com/1221/1043059543_05713e6156.jpg [] []
[2015-01-07 12:54:47] MCS.WARNING: https://#/urchin.js [] []
[2015-01-07 12:54:47] MCS.INFO: 00041 - https://www.bram.us/2011/09/30/css-regions-and-css-exclusions/ [] []
[2015-01-07 12:54:47] MCS.INFO: 00042 - https://www.bram.us/2014/06/04/good-looking-shapes-gallery/ [] []
...
Mixed Content Scan使用ANSI颜色,由bramus/ansi-php提供,因此可以根据颜色轻松识别错误。
高级用法 / CLI选项
Mixed Content Scan支持多个CLI选项,可以修改其行为
--output=path/to/file
:输出结果的文件。默认为php://stdout
(=显示在屏幕上)。--format=ansi|no-ansi|json
:定义用于输出结果的格式化程序ansi
(默认):ANSI颜色行格式化程序no-ansi
:Monolog行格式化程序json
:Monolog JSON格式化程序
--no-crawl
:不要爬取已扫描的页面以查找新页面--no-check-certificate
:不要检查证书的有效性(例如,允许自签名或缺失的证书)--timeout=value-in-milliseconds
:等待每个请求完成的时长。默认为10000ms。--delay=value-in-seconds
:每次请求之间的等待时长。默认为0s。--input=path/to/file
:指定一个包含链接列表的文件作为源,而不是解析传入的URL。自动启用--no-crawl
--ignore=path/to/file
:包含要忽略的URL模式的文件。有关如何构建此文件的更多信息,请参阅下文的忽略链接。--loglevel=level
:用于日志记录的Monolog日志级别。默认为200
(=info
)。支持输入数字值和字符串(小写)值。有关更多信息,请参阅Monolog日志级别。--user-agent='user-agent'
:设置在爬取时使用的用户代理。
示例: mixed-content-scan https://www.bram.us/ --ignore=./wordpress.txt --output=./results.txt --format=no-ansi
错误处理
Mixed Content Scan 内部使用 Curl 进行请求。如果遇到错误(例如连接丢失),错误信息将显示在屏幕上
...
[2015-01-07 12:56:43] MCS.INFO: 00003 - https://www.bram.us/projects/bramusicq/ [] []
[2015-01-07 12:56:53] MCS.CRITICAL: cURL Error (28): SSL connection timeout [] []
...
忽略链接
可以定义一个要忽略的模式的列表。为此,创建一个包含每行一个要忽略的 PCRE 模式的文本文件。使用 --ignore
选项传入该文件的路径。以 #
开头的行被视为注释,因此将被忽略。
对于 WordPress 安装,忽略模式文件(与 Mixed Content Scan 一起在 ignorepattens/wordpress.txt
中分发)是这样的
# Paginated Overview Links
^{$rootUrl}/page/(\d+)/$
# Single Post Links
# ^{$rootUrl}/(\d+)/(\d+)/
# Tag Overview Links
^{$rootUrl}/tag/
# Author Overview Links
^{$rootUrl}/author/
# Category Overview Links
^{$rootUrl}/category/
# Monthly Overview Links
^{$rootUrl}/(\d+)/(\d+)/$
# Year Overview Links
^{$rootUrl}/(\d+)/$
# Comment Subscription Link
^{$rootUrl}/comment-subscriptions
# Wordpress Core File Links
^{$rootUrl}/(.*)?wp\-(.*)\.php
# Archive Links
^{$rootUrl}/archive/
# Replyto Links
\?replytocom\=
每个模式中的 {$rootUrl}
标记将被传递给脚本的(根)URL 替换。
注意:可能会用到PHP PCRE 技巧表。
已知问题
Mixed Content Scan
- 不考虑
<base href="...">
标签(但是谁会使用那个,对吧?) - 不会扫描链接的
.css
或.js
文件本身是否存在混合内容 - 不会扫描内联
<script>
或<style>
中的混合内容
当你遇到问题时,请提交一个 issue (或者修复它并执行一个 pull request ;))