mirazmac/html-sanitizer

一个轻量级的库,使在PHP中清理HTML更加容易。

1.0.0 2022-01-10 11:09 UTC

This package is auto-updated.

Last update: 2024-09-10 16:56:48 UTC


README

PHP Requirements Checker

HTMLSanitizer

一个超级轻量级的PHP库,用于对HTML字符串进行白名单清理。它具有HTML清理器应有的所有功能,包括基于标签的白名单、允许自定义标签和属性,甚至可以将自定义属性视为布尔值或URL。

序言

几乎每个PHP应用都需要偶尔清理HTML。无论是简单的评论还是完整的WYSIWYG编辑器输出。确保只有允许的HTML通过至关重要。现在市面上有大量的PHP HTML清理库。但不要误解我,其中大多数都相当臃肿。我明白,因为它们需要确保用户的绝对安全,这可能会变得相当复杂。但对我们大多数人来说,并不需要这种功能。《HtmlSanitizer》不关心验证或修复HTML,它将HTML视为原始的。将其与定义的HTML标签和属性白名单进行匹配,并在必要时进行转义。此外,它还允许您定义这些属性的类型。目前支持的是URL和布尔值。您还可以为特定标签定义允许的主机,例如,您可能只想允许youtube.com URL在iframe中,这可以非常容易地完成。

要求

HtmlSanitizer没有外部依赖,只有原生PHP依赖。其中大多数都非常常见,几乎所有情况下都捆绑在PHP中。

  • PHP >=7.0
  • mbstring
  • libxml
  • dom

安装

composer require mirazmac/html-sanitizer dev-main

用法

use MirazMac\HtmlSanitizer\Whitelist;
use MirazMac\HtmlSanitizer\Sanitizer;

require_once '../vendor/autoload.php';

$whitelist = new Whitelist;

// Allow the anchor tag with specific attributes
$whitelist->allowTag('a', ['href', 'title', 'download', 'data-url', 'data-loaded']);

// You can add multiple tags at once as well if that's what you prefer
$whitelist->setTags(
    [
        // allows the `abbr` tag and it's title attribute
        'abbr' =>  ['title'],
        // allows only the em tag, any attributes would be stripped off
        'em'   =>  [],
    ],
    true
);

// Set allowed hosts for the URL attributes on the `a` tag
$whitelist->setAllowedHosts('a', ['google.com', 'facebook.com']);

// Set the allowed protocols for this document
$whitelist->setProtocols(['http', '//', 'https']);

// Set a list of allowed values for an attribute's tag
$whitelist->setAllowedValues('abbr', 'title', ['one', 'two', 'three']);

// Set a list of custom attributes to be treated as URL (i.e to use the host & protocol filter)
$whitelist->treatAttributesAsUrl(['data-url']);

// Set a list of custom attributes to be treated as HTML Boolean (Not true/false ) (i.e their values would be set to blank or the name of the attribute itself)
$whitelist->treatAttributesAsBoolean(['data-load']);

// Create the sanitizer instance that uses this whitelist
$htmlsanitizer = new Sanitizer($whitelist);

// returns sanitized string
$sanitizedHTML = $htmlsanitizer->sanitize('<a href="//google.com" data-download="">Google</a> <a href="https://bing.com" data-url="https://bing.com">My URL would be removed</a>');

echo "HTML Source Output: <pre>";
echo htmlspecialchars($sanitizedHTML);
echo "</pre><br>Rendered Output:<br>" . $sanitizedHTML;

怪癖

  • 目前不支持在包含多个URL的属性上进行URL过滤,例如:srcset

待办事项

  • 完整的测试覆盖率
  • 编写扩展文档