apemsel/attributed-string

PHP中快速易用的属性字符串类的集合。属性字符串可以为字符串中的每个字符设置多个属性,例如在文字处理软件和自然语言处理中使用。

v3.0.0 2024-03-29 16:02 UTC

README

PHP中操作属性字符串类的集合。属性字符串是指每个字符都可以有多个属性的字符串。每个属性都是一个长度与字符串相等的位图或布尔数组。这种简单的数据结构可以用于实现许多有趣的功能,例如

  • 在文字处理器中(例如,设置字符串的某个范围的“粗体”属性)进行文本装饰、颜色、字体等
  • 语义文本分析系统(带有“动词”和“名词”等属性)
  • 核心文本提取

示例

  use apemsel\AttributedString\AttributedString;

  // ...

  $as = new AttributedString("The quick brown fox");

  $as->setLength(10, 5, "color"); // "brown" has attribute "color"
  $as->is("color", 12); // == true
  $as->toHtml(); // "The quick <span class=\"color\">brown</span> fox"

  $as->setPattern("/[aeiou]/", "vowel"); // vowels have attribute "vowel"
  $as->getAttributes(12); // char at offset 12 has attributes ["color", "vowel"]

  $as->combineAttributes("and", "color", "vowel", "colored-vowel"); // also use "or", "not", "xor" to combine attributes
  $as->is("colored-vowel", 12); // "o" of "brown" is a color vowel ;-)

  $as->setSubstring("fox", "noun"); // all instances of "fox" have attribute "noun"
  $as->is("noun", 16); // true, char at offset 16 is part of a noun

  $as->searchAttribute("vowel"); // 2, first vowel starts at offset 2
  $as->searchAttribute("vowel", 0, true); // [2, 1], first vowel starting at offset 0 is at offset 2 with length 1

  // MutableAttributedString can be modified after creation and tries to be smart about the attributes
  $mas = new MutableAttributedString("The brown fox");
  $mas->setLength(0, 13, "bold");
  $mas->insert(4, "quick "); // "The quick brown fox";
  $mas->is("bold", 6) // true, "quick" is now also bold since the inserted text was inside the "bold" attribute
  $mas->delete(10, 6) // "The quick fox"

  // TokenizedAttributedString tokenizes the given string, can set attributes by token
  // and maintains the tokens' offsets in the original string.
  $tas = new TokenizedAttributedString("The quick brown fox"); // tokenize using the default whitespace tokenizer
  $tas->getToken(2); // "brown"
  $tas->setTokenAttribute(2, "bold"); // "brown" is "bold"
  $tas->getTokenOffset(2); // 10, "brown" starts at offset 10
  $tas->getTokenOffsets(); // [0, 4, 10, 16], start offsets of the tokens in the string
  $tas->setTokenRangeAttribute(2, 3, "underlined"); // set tokens 2 to 3 to "underlined"
  $tas->getAttributesAtToken(2); // ["bold", "underlined"]
  $tas->lowercaseTokens(); // convert tokens to lowercase
  $tas->setTokenDictionaryAttribute(["a", "an", "the"], "article"); // set all tokens contained in given dictionary to an attribute
  $tas->getAttributesAtToken(0); // "article"

安装

使用Composer(推荐)

composer require apemsel/AttributedString

文档

在doc/目录中查看生成的phpdoc API文档,或尝试http://htmlpreview.github.io/?https://raw.githubusercontent.com/apemsel/AttributedString/master/doc/index.html