README

Regex 是一个包含轻量级正则表达式库封装和日常使用扩展的 PHP 库。我们试图 解决这些库暴露的错误相关的问题 以及它们以相当特殊的方式处理的问题。我们还尝试了 统一它们提供的 API，以便这个库可以作为大多数用法的即插即用替代品。

目前最流行的正则表达式库是 PCRE，它拥有 preg_* 函数。带有 mb_ereg_* 函数的 Mbstring 扩展是一个可选扩展，并不在所有地方都可用。POSIX 扩展实现及其 ereg_* 函数自 PHP 5.3.0 起已弃用，不应使用。

Regex 库实现了对以下库的封装：

PCRE 库通过 PcreRegex 类实现
mbstring 扩展通过 MbRegex 类实现

安装

从 http://www.getcomposer.org/download 下载并安装 composer

将以下内容添加到您的项目 composer.json 文件中

{
    "require": {
        "gobie/regex": "dev-master"
    }
}

完成后，只需运行 php composer.phar install，即可使用此包。

统一 API

正则表达式库提供了各种各样的函数，但它们不能直接替换。因此，我们试图统一这些库的 API。在所有封装器中都实现了具有基本签名的几个方法。模式签名在每个驱动程序中都是不同的，因为 PCRE 类似于 Perl，而 mbstring 可以根据选项动态更改它。

match($pattern, $subject)
get($pattern, $subject)
getAll($pattern, $subject)
replace($pattern, $replacement, $subject)
split($pattern, $subject)
grep($pattern, $subject)
filter($pattern, $replacement, $subject)

驱动程序可以为签名添加其他参数或处理更多类型。

例如 PcreRegex::get() 为签名添加了 $flags 和 $offset 参数，创建签名 PcreRegex::get($pattern, $subject, $flags, $offset)，但基本参数保持不变。

为了实现此 API，必须在使用者代码中创建一些方法。

例如，mbstring 没有与 PCRE 函数 preg_match_all()、preg_grep() 或 preg_filter() 相对应的功能，因此必须从头开始创建类似 MbRegex::getAll()、MbRegex::grep() 和 MbRegex::filter() 的方法，使用 mbstring 原始功能。

示例

此库通过以可重用方式处理所有重负载来解决错误处理问题。每个错误都通过从 \Gobie\Regex\RegexException 派生的异常来处理。

use Gobie\Regex\Wrappers\Pcre\PcreRegex;
use Gobie\Regex\Wrappers\Pcre\PcreRegexException;

// matching
if (PcreRegex::match($pattern, $subject)) {
    // do something
}

// matching and parsing
if ($matches = PcreRegex::get($pattern, $subject)) {
    // do something with $matches
}

// replace with callback
if ($res = PcreRegex::replace($pattern, $callback, $subject)) {
    // do something with $res
}

// error handling
try {
    // matching and parsing
    if ($matches = PcreRegex::getAll($pattern, $subject)) {
        // do something with $matches
    }
} catch (PcreRegexException $e) {
    // handle error
}

use Gobie\Regex\Wrappers\Mb\MbRegex;

// greping
if ($res = MbRegex::grep($pattern, $subject)) {
    // do something with $res
}

// splitting
if ($res = MbRegex::split($pattern, $subject)) {
    // do something with $res
}

再次具有很好的可读性，并处理了所有所需的错误处理。

需求

PHP 5.3.3 或更高版本。单元测试定期针对最新版本的 5.3、5.4、5.5 和 HHVM 运行。对于 mb_ereg_replace_callback() 以及因此对于 MbRegex::replace() 的使用，需要 PHP 5.4 或更高版本。

关于 HHVM 的说明

函数 preg_filter() 和 mb_ereg_replace_callback() 目前不受支持。一些错误消息具有不同的格式，主要是添加了模式，这导致单元测试错误。回溯和递归错误消息格式完全不同，并且描述得更加详细。您可以在 travis-ci 的单元测试报告中找到这些差异。

常见问题解答

我为什么要关心？/它解决了什么问题？

正则表达式库为我们提供了一套必须处理的函数。它们在功能上有些相似，但在覆盖的功能和错误处理方面又略有不同。

以PCRE库为例。在应用程序中常见的代码如下：

if (preg_match($pattern, $subject, $matches)) {
    // do something with $matches if used at all
}

只要$pattern不是动态创建的，并且匹配永远不会达到回溯或递归限制，并且$subject以及$pattern都是格式良好的UTF-8字符串（如果使用UTF-8），这段代码就是正确的。

这里可能发生两种错误。我们谈论的是编译错误，它触发E_WARNING，如输入错误。以及运行时错误，如达到回溯或递归限制或编码问题。我们可以使用preg_last_error()函数来处理这些错误。但是，只有当没有编译错误发生时，这个函数才是可靠的，因为它不会清除其状态。

更健壮且错误更少的版本

set_error_handler(function () {
    // deal with compilation error
});

if (preg_match($pattern, $subject, $matches)) {
    // do something with $matches if used at all
}

restore_error_handler();

if (preg_last_error()) {
    // deal with runtime error
}

这里有大量的错误处理需要考虑，但问题更加复杂。

例如，天真地使用preg_replace_callback()可能会让你的生活变得更艰难，并对你调试技能进行考验。通常，你使用它的最简单方式

if ($res = preg_replace_callback($pattern, $callback, $subject)) {
    // do something with $res
}

这里可能会发生很多事情，上面提到的编译和运行时错误，以及从$callback内部触发的错误。我们无法像上面那样使用错误处理程序来覆盖它，因为回调内部的错误不应该被正则表达式错误处理捕获。所以，正确的解决方案，它可以捕获编译和运行时错误，但让其他错误通过，可能看起来像这样

set_error_handler(function () {
    // deal with compilation error
});

preg_match($pattern, '');

restore_error_handler();

$res = preg_replace_callback($pattern, $callback, $subject);

if ($res === null && preg_last_error()) {
    // deal with runtime error
}

不提处理更复杂的案例，比如模式数组。

为什么它通过静态方法实现，而不是我们所有人都在使用和喜欢的良好面向对象方式呢？

它旨在用作库/扩展函数当前使用的直接替换。

我想以面向对象的方式使用它作为依赖项，我可以吗？

没问题，为此我们有一个RegexFacade，它只是将对象调用重定向到给定的包装器。

$regex = new RegexFacade(RegexFacade::PCRE);
if ($regex->match($pattern, $subject)) {
    // do something
}

// is equivalent to

if (PcreRegex::match($pattern, $subject)) {
    // do something
}

我不想使用异常来处理正则表达式错误。我能做什么？

包装器已准备好扩展以覆盖您想要的任何内容。例如，可以通过这种方式触发错误而不是抛出异常来实现

class MyErrorHandlingPcreRegex extends PcreRegex
{
    protected static function setUp($pattern)
    {
        set_error_handler(function ($_, $errstr) use ($pattern) {
            static::tearDown(); // or restore_error_handler() for PHP 5.3
            trigger_error($errstr . '; ' . $pattern, E_USER_WARNING);
        });
    }

    protected static function handleError($pattern)
    {
        if ($error = preg_last_error()) {
            trigger_error(PcreRegexException::$messages[$error] . '; ' . $pattern, E_USER_WARNING);
        }
    }
}

我只是想使用统一的API而不进行错误处理。

包装器很容易扩展以适应任何请求。

class NoErrorHandlingPcreRegex extends PcreRegex
{
    protected static function setUp($pattern) {}
    protected static function tearDown() {}
    protected static function handleError($pattern) {}
}

性能

性能是我们尽量避免使用正则表达式的主要原因之一。如果可能，使用字符串函数（如strpos、substr、str_replace、explode等）要高效得多。

但是，有时我们需要正则表达式。因此，我们进行了一些基准测试，以向您展示，我们的抽象从使用原生函数的性能中拿走了什么。我们认为，增加的功能和可用的错误处理在很大程度上补偿了失去的性能，但您自己决定。

MbBench缺少一些原生函数基准测试，因为它们没有原生实现。

添加了StringBench以进行比较。它可以在测试场景中完成大致相同的工作。

Gobie\Bench\MbBench
    Method Name              Iterations    Average Time      Ops/second
    ----------------------  ------------  --------------    -------------
    libraryMatch          : [10,000    ] [0.0000252296209] [39,635.95024]
    libraryGet            : [10,000    ] [0.0000270466089] [36,973.21179]
    libraryGetAll         : [10,000    ] [0.0000665599585] [15,024.04784]
    libraryReplace        : [10,000    ] [0.0000590241909] [16,942.20598]
    libraryReplaceCallback: [10,000    ] [0.0000805493355] [12,414.75171]
    libraryGrep           : [10,000    ] [0.0000354112625] [28,239.60314]
    libraryFilter         : [10,000    ] [0.0000871953726] [11,468.49850]
    librarySplit          : [10,000    ] [0.0000985779762] [10,144.25370]

    nativeMatch           : [10,000    ] [0.0000034259081] [291,893.41165]
    nativeGet             : [10,000    ] [0.0000050207376] [199,173.92027]
    nativeReplace         : [10,000    ] [0.0000085228920] [117,331.06558]
    nativeReplaceCallback : [10,000    ] [0.0000239482880] [ 41,756.63837]
    nativeSplit           : [10,000    ] [0.0000068708181] [145,543.07506]


Gobie\Bench\PcreBench
    Method Name              Iterations    Average Time      Ops/second
    ----------------------  ------------  --------------    -------------
    libraryMatch          : [10,000    ] [0.0000344554901] [29,022.95096]
    libraryGet            : [10,000    ] [0.0000327244282] [30,558.21158]
    libraryGetAll         : [10,000    ] [0.0000381940365] [26,182.09784]
    libraryReplace        : [10,000    ] [0.0000430052996] [23,252.94813]
    libraryReplaceCallback: [10,000    ] [0.0000728453159] [13,727.71862]
    libraryGrep           : [10,000    ] [0.0000342110157] [29,230.35109]
    libraryFilter         : [10,000    ] [0.0000352864027] [28,339.52807]
    librarySplit          : [10,000    ] [0.0000331200361] [30,193.20378]

    nativeMatch           : [10,000    ] [0.0000061690331] [162,099.95826]
    nativeGet             : [10,000    ] [0.0000071261883] [140,327.47395]
    nativeGetAll          : [10,000    ] [0.0000095671892] [104,523.90743]
    nativeReplace         : [10,000    ] [0.0000074643373] [133,970.36512]
    nativeReplaceCallback : [10,000    ] [0.0000232770443] [ 42,960.78090]
    nativeGrep            : [10,000    ] [0.0000091099977] [109,769.51120]
    nativeFilter          : [10,000    ] [0.0000103915691] [ 96,231.85746]
    nativeSplit           : [10,000    ] [0.0000077021599] [129,833.71098]


Gobie\Bench\StringBench
    Method Name             Iterations    Average Time      Ops/second
    ---------------------  ------------  --------------    -------------
    stringMatch          : [10,000    ] [0.0000034402609] [290,675.62979]
    stringGet            : [10,000    ] [0.0000050993204] [196,104.56282]
    stringGetAll         : [10,000    ] [0.0000067927361] [147,216.08379]
    stringReplace        : [10,000    ] [0.0000036442518] [274,404.74711]
    stringReplaceCallback: [10,000    ] [0.0000126019001] [ 79,353.11279]
    stringGrep           : [10,000    ] [0.0000082664728] [120,970.57865]
    stringFilter         : [10,000    ] [0.0000143237352] [ 69,814.19186]
    stringSplit          : [10,000    ] [0.0000041101217] [243,301.79650]

您可以自己运行基准测试

$ cd project_root
$ composer install
$ php vendor/athletic/athletic/bin/athletic -p tests/Gobie/Bench -b tests/bootstrap.php

贡献

欢迎贡献以及任何问题或问题。

单元和集成测试通过phpunit进行，配置文件为tests/complete.phpunit.xml。

gobie / regex

维护者

详细信息

README

安装

统一 API

示例

需求

关于 HHVM 的说明

常见问题解答

性能

贡献