sajjadh47/bpe-encoder-php

BPE(字节对编码)编码器/解码器,用于OpenAI的GPT-2 / GPT-3,纯PHP实现,无依赖,支持多字节。

v1.0.0 2023-02-27 20:41 UTC

This package is auto-updated.

Last update: 2024-09-28 00:00:58 UTC


README

PHP实现的GPT-2 / GPT-3的BPE编码器/解码器。这是OpenAI原始Python编码器/解码器的PHP实现,可以在这里找到。它遵循OpenAI Python实现的99%精确算法,以获得速度和多字节编码。

使用composer安装

composer require sajjadh47/bpe-encoder-php

用法

该软件包的最小PHP版本为&≥ 7.4,并且必须启用mb_* 函数以支持多字节编码。

<?php

include 'vendor/autoload.php';

use sajjadh47\BPE;

$BPE = new BPE();

$encoded = $BPE->encode( "Hello!! I'm Sajjad Hossain Sagor. It's 2023, Nice To Meet You. What's Up? :) 🤗" );

print_r( $encoded );

//Outputs
Array
(
    [0] => 15496
    [1] => 3228
    [2] => 314
    [3] => 1101
    [4] => 220
    [5] => 50
    [6] => 64
    [7] => 73
    [8] => 73
    [9] => 64
    [10] => 67
    [11] => 220
    [12] => 39
    [13] => 78
    [14] => 82
    [15] => 82
    [16] => 64
    [17] => 72
    [18] => 77
    [19] => 220
    [20] => 50
    [21] => 64
    [22] => 70
    [23] => 78
    [24] => 81
    [25] => 13
    [26] => 632
    [27] => 338
    [28] => 220
    [29] => 17
    [30] => 15
    [31] => 17
    [32] => 18
    [33] => 11
    [34] => 18460
    [35] => 1675
    [36] => 21167
    [37] => 921
    [38] => 13
    [39] => 1867
    [40] => 338
    [41] => 3205
    [42] => 30
    [43] => 14373
    [44] => 3467
    [45] => 463
    [46] => 5999
    [47] => 68
    [48] => 59
    [49] => 4185
    [50] => 1558
)

echo $BPE->decode( $encoded );

//Outputs
Hello!! I'm Sajjad Hossain Sagor. It's 2023, Nice To Meet You. What's Up? :) 🤗