mirazmac / html-sanitizer
一个轻量级的库,使在PHP中清理HTML更加容易。
1.0.0
2022-01-10 11:09 UTC
Requires
- php: >=7.0
Requires (Dev)
README
HTMLSanitizer
一个超级轻量级的PHP库,用于对HTML字符串进行白名单清理。它具有HTML清理器应有的所有功能,包括基于标签的白名单、允许自定义标签和属性,甚至可以将自定义属性视为布尔值或URL。
序言
几乎每个PHP应用都需要偶尔清理HTML。无论是简单的评论还是完整的WYSIWYG编辑器输出。确保只有允许的HTML通过至关重要。现在市面上有大量的PHP HTML清理库。但不要误解我,其中大多数都相当臃肿。我明白,因为它们需要确保用户的绝对安全,这可能会变得相当复杂。但对我们大多数人来说,并不需要这种功能。《HtmlSanitizer》不关心验证或修复HTML,它将HTML视为原始的。将其与定义的HTML标签和属性白名单进行匹配,并在必要时进行转义。此外,它还允许您定义这些属性的类型。目前支持的是URL和布尔值。您还可以为特定标签定义允许的主机,例如,您可能只想允许youtube.com URL在iframe中,这可以非常容易地完成。
要求
HtmlSanitizer
没有外部依赖,只有原生PHP依赖。其中大多数都非常常见,几乎所有情况下都捆绑在PHP中。
- PHP >=7.0
- mbstring
- libxml
- dom
安装
composer require mirazmac/html-sanitizer dev-main
用法
use MirazMac\HtmlSanitizer\Whitelist; use MirazMac\HtmlSanitizer\Sanitizer; require_once '../vendor/autoload.php'; $whitelist = new Whitelist; // Allow the anchor tag with specific attributes $whitelist->allowTag('a', ['href', 'title', 'download', 'data-url', 'data-loaded']); // You can add multiple tags at once as well if that's what you prefer $whitelist->setTags( [ // allows the `abbr` tag and it's title attribute 'abbr' => ['title'], // allows only the em tag, any attributes would be stripped off 'em' => [], ], true ); // Set allowed hosts for the URL attributes on the `a` tag $whitelist->setAllowedHosts('a', ['google.com', 'facebook.com']); // Set the allowed protocols for this document $whitelist->setProtocols(['http', '//', 'https']); // Set a list of allowed values for an attribute's tag $whitelist->setAllowedValues('abbr', 'title', ['one', 'two', 'three']); // Set a list of custom attributes to be treated as URL (i.e to use the host & protocol filter) $whitelist->treatAttributesAsUrl(['data-url']); // Set a list of custom attributes to be treated as HTML Boolean (Not true/false ) (i.e their values would be set to blank or the name of the attribute itself) $whitelist->treatAttributesAsBoolean(['data-load']); // Create the sanitizer instance that uses this whitelist $htmlsanitizer = new Sanitizer($whitelist); // returns sanitized string $sanitizedHTML = $htmlsanitizer->sanitize('<a href="//google.com" data-download="">Google</a> <a href="https://bing.com" data-url="https://bing.com">My URL would be removed</a>'); echo "HTML Source Output: <pre>"; echo htmlspecialchars($sanitizedHTML); echo "</pre><br>Rendered Output:<br>" . $sanitizedHTML;
怪癖
- 目前不支持在包含多个URL的属性上进行URL过滤,例如:srcset
待办事项
- 完整的测试覆盖率
- 编写扩展文档