README

这是一个用 BibTeX 编写的 PHP 解析器。

您正在查看 BibTeX 解析器 2.x 的文档，这是最新版本。

安装

composer require renanbr/bibtex-parser

用法

use RenanBr\BibTexParser\Listener;
use RenanBr\BibTexParser\Parser;
use RenanBr\BibTexParser\Processor;

require 'vendor/autoload.php';

$bibtex = <<<BIBTEX
@article{einstein1916relativity,
  title={Relativity: The Special and General Theory},
  author={Einstein, Albert},
  year={1916}
}
BIBTEX;

// Create and configure a Listener
$listener = new Listener();
$listener->addProcessor(new Processor\TagNameCaseProcessor(CASE_LOWER));
// $listener->addProcessor(new Processor\NamesProcessor());
// $listener->addProcessor(new Processor\KeywordsProcessor());
// $listener->addProcessor(new Processor\DateProcessor());
// $listener->addProcessor(new Processor\FillMissingProcessor([/* ... */]));
// $listener->addProcessor(new Processor\TrimProcessor());
// $listener->addProcessor(new Processor\UrlFromDoiProcessor());
// $listener->addProcessor(new Processor\LatexToUnicodeProcessor());
// ... you can append as many Processors as you want

// Create a Parser and attach the listener
$parser = new Parser();
$parser->addListener($listener);

// Parse the content, then read processed data from the Listener
$parser->parseString($bibtex); // or parseFile('/path/to/file.bib')
$entries = $listener->export();

print_r($entries);

这将输出

Array
(
    [0] => Array
        (
            [_type] => article
            [citation-key] => einstein1916relativity
            [title] => Relativity: The Special and General Theory
            [author] => Einstein, Albert
            [year] => 1916
        )
)

词汇表

BibTeX 的一切都关于 "条目"、"标签名称" 和 "标签内容"。

一个 BibTeX 条目由类型（@ 后的单词）、引用键和一个或多个标签组成，这些标签定义了特定 BibTeX 条目的各种特性。(...) 一个 BibTeX 标签由其名称后跟一个等号，以及内容组成。

来源： http://www.bibtex.org/Format/

注意：此库将 "类型" 和 "引用键" 视为标签。此行为可以通过实现自己的监听器来更改。

处理器

Processor 是一个可调用，它接收一个条目作为参数并返回一个修改后的条目。

此库包含三个主要部分

Parser 类，负责检测 BibTeX 输入中的单元；
Listener 类，负责收集单元并将它们转换为条目列表；
Processor 类，负责操作条目。

尽管您不能配置 Parser，但在导出内容之前，您可以通过 Listener::addProcessor() 将尽可能多的 Processor 添加到 Listener 中。请注意，Listener 默认提供以下功能

通过 Listener::export() 方法可访问找到的条目；
标签内容连接;
- 例如，hello # " world" 标签的内容将生成 hello world 字符串
标签内容缩写处理;
- 例如，@string{foo="bar"} @misc{bar=foo} 将使 $entries[1]['bar'] 假设 bar 为值
出版物的类型公开为 _type 标签；
引用键公开为 citation-key 标签；
原始条目文本公开为 _original 标签。

该项目提供了一些有用的处理器。

标签名称大小写

在 BibTeX 中，标签名称不区分大小写。此库将条目公开为数组，其中键区分大小写。为了避免这种误解，您可以使用 TagNameCaseProcessor 强制标签名称的字符大小写。

用法

use RenanBr\BibTexParser\Processor\TagNameCaseProcessor;

$listener->addProcessor(new TagNameCaseProcessor(CASE_UPPER)); // or CASE_LOWER

@article{
  title={BibTeX rocks}
}

Array
(
    [0] => Array
        (
            [TYPE] => article
            [TITLE] => BibTeX rocks
        )
)

作者和编辑者

BibTeX 可以识别作者姓名的四个部分：姓氏、中间名、名字、小名。如果您想解析条目中包含的 author 和 editor 标签，可以使用 NamesProcessor 类。

用法

use RenanBr\BibTexParser\Processor\NamesProcessor;

$listener->addProcessor(new NamesProcessor());

@article{
  title={Relativity: The Special and General Theory},
  author={Einstein, Albert}
}

Array
(
    [0] => Array
        (
            [type] => article
            [title] => Relativity: The Special and General Theory
            [author] => Array
                (
                    [0] => Array
                        (
                            [first] => Albert
                            [von] =>
                            [last] => Einstein
                            [jr] =>
                        )
                )
        )
)

关键词

keywords 标签包含一系列表示为字符串的表达式，您可能希望将它们作为数组来读取。

用法

use RenanBr\BibTexParser\Processor\KeywordsProcessor;

$listener->addProcessor(new KeywordsProcessor());

@misc{
  title={The End of Theory: The Data Deluge Makes the Scientific Method Obsolete},
  keywords={big data, data deluge, scientific method}
}

Array
(
    [0] => Array
        (
            [type] => misc
            [title] => The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
            [keywords] => Array
                (
                    [0] => big data
                    [1] => data deluge
                    [2] => scientific method
                )
        )
)

日期

它添加了一个新的标签 _date 作为 DateTimeImmutable。只有当标签 month 和 year 都满足时，此处理器才会添加新的标签。

用法

use RenanBr\BibTexParser\Processor\DateProcessor;

$listener->addProcessor(new DateProcessor());

@misc{
  month="1~oct",
  year=2000
}

Array
(
    [0] => Array
        (
            [type] => misc
            [month] => 1~oct
            [year] => 2000
            [_date] => DateTimeImmutable Object
                (
                    [date] => 2000-10-01 00:00:00.000000
                    [timezone_type] => 3
                    [timezone] => UTC
                )
        )
)

填充缺失的标签

它为某些缺失的字段设置默认值。

用法

use RenanBr\BibTexParser\Processor\FillMissingProcessor;

$listener->addProcessor(new FillMissingProcessor([
    'title' => 'This entry has no title',
    'year' => 1970,
]));

@misc{
}

@misc{
    title="I do exist"
}

Array
(
    [0] => Array
        (
            [type] => misc
            [title] => This entry has no title
            [year] => 1970
        )
    [1] => Array
        (
            [type] => misc
            [title] => I do exist
            [year] => 1970
        )
)

修剪标签

对所有标签应用 trim()。

用法

use RenanBr\BibTexParser\Processor\TrimProcessor;

$listener->addProcessor(new TrimProcessor());

@misc{
  title=" too much space  "
}

Array
(
    [0] => Array
        (
            [type] => misc
            [title] => too much space
        )

)

从 DOI 确定网址

如果存在 doi 标签且 url 标签缺失，则将 url 标签设置为 DOI。

用法

use RenanBr\BibTexParser\Processor\UrlFromDoiProcessor;

$listener->addProcessor(new UrlFromDoiProcessor());

@misc{
  doi="qwerty"
}

@misc{
  doi="azerty",
  url="http://example.org"
}

Array
(
    [0] => Array
        (
            [type] => misc
            [doi] => qwerty
            [url] => https://doi.org/qwerty
        )

    [1] => Array
        (
            [type] => misc
            [doi] => azerty
            [url] => http://example.org
        )
)

LaTeX 转换为 unicode

BibTeX 文件存储 LaTeX 内容。您可能希望将它们作为 unicode 读取。LatexToUnicodeProcessor 类解决了这个问题，但在将处理器添加到监听器之前，您必须

安装 Pandoc 到您的系统中；并且
将 ryakad/pandoc-php 或 ueberdosis/pandoc 作为您项目的依赖项添加。

用法

use RenanBr\BibTexParser\Processor\LatexToUnicodeProcessor;

$listener->addProcessor(new LatexToUnicodeProcessor());

@article{
  title={Caf\\'{e}s and bars}
}

Array
(
    [0] => Array
        (
            [type] => article
            [title] => Cafés and bars
        )
)

注意：顺序很重要，请将其作为最后一个添加。

自定义

Listener::addProcessor() 方法期望一个 callable 作为参数。在下面的示例中，我们将文本 with laser 添加到所有条目的 title 标签中。

用法

$listener->addProcessor(static function (array $entry) {
    $entry['title'] .= ' with laser';
    return $entry;
});

@article{
  title={BibTeX rocks}
}

Array
(
    [0] => Array
        (
            [type] => article
            [title] => BibTeX rocks with laser
        )
)

错误处理

此库抛出两种类型的异常：ParserException 和 ProcessorException。第一种异常可能在数据提取期间发生。当它发生时，可能意味着解析的 BibTeX 无效。第二种异常可能在数据处理期间发生。当它发生时，意味着监听器的处理器无法正确处理找到的数据。两者都实现了 ExceptionInterface。

use RenanBr\BibTexParser\Exception\ExceptionInterface;
use RenanBr\BibTexParser\Exception\ParserException;
use RenanBr\BibTexParser\Exception\ProcessorException;

try {
    // ... parser and listener configuration

    $parser->parseFile('/path/to/file.bib');
    $entries = $listener->export();
} catch (ParserException $exception) {
    // The BibTeX isn't valid
} catch (ProcessorException $exception) {
    // Listener's processors aren't able to handle data found
} catch (ExceptionInterface $exception) {
    // Alternatively, you can use this exception to catch all of them at once
}

高级用法

此库的核心包含以下主要类

RenanBr\BibTexParser\Parser 负责检测 BibTeX 输入内的单元；
RenanBr\BibTexParser\ListenerInterface 负责处理找到的单元。

您可以通过 Parser::addListener() 将监听器附加到解析器上。解析器能够检测 BibTeX 单元，例如 "type"，"tag 的名称"，"tag 的内容"。当解析器找到单元时，它会触发附加到其上的监听器。

您可以编写自己的监听器！您要做的就是处理单元。

namespace RenanBr\BibTexParser;

interface ListenerInterface
{
    /**
     * Called when an unit is found.
     *
     * @param string $text    The original content of the unit found.
     *                        Escape character will not be sent.
     * @param string $type    The type of unit found.
     *                        It can assume one of Parser's constant value.
     * @param array  $context Contains details of the unit found.
     */
    public function bibTexUnitFound($text, $type, array $context);
}

$type 可能具有以下值之一

Parser::TYPE
Parser::CITATION_KEY
Parser::TAG_NAME
Parser::RAW_TAG_CONTENT
Parser::BRACED_TAG_CONTENT
Parser::QUOTED_TAG_CONTENT
Parser::ENTRY

$context 是一个包含以下键的数组

offset 包含 $text 的起始位置。这可能很有用，例如，您可以在文件指针上进行 seek；
length 包含原始 $text 的长度。它可能与发送到监听器的字符串长度不同，因为可能存在转义字符。

renanbr / bibtex-parser

维护者

详细信息