yui-ezic/z99-lexer

用于Z99(类似Pascal的自定义编程语言)的PHP词法分析器。

dev-master 2020-04-26 17:14 UTC

This package is auto-updated.

Last update: 2024-09-27 02:59:39 UTC


README

Z99是一种为教育目的开发的类似Pascal的编程语言。Z99编写的程序示例

program first
var i: int;
    sum, value : real;
begin
    sum = 0.0;
    i = 1;
    repeat
        read (value);
        sum = sum + value;
        write(i, sum);
        i = i + 1;
    until i <= 100;

    sum = sum / 100;
    write(sum);
end.

语法初始化

使用有限状态机编写的词法分析器,有限状态机由Z99Lexer\FSM\FSM类表示。

require 'vendor/autoload.php';

$fsm = new Z99Lexer\FSM\FSM();

状态图初始化发生在文件"create_fsm.php"中。您可以通过运行visualize()方法以图片形式查看状态图。

dgt - 数字
chr - 字符
def - 默认
WS - 空白

gra3231 tmp

为了方便,所有最终状态都以负号开头,并以蓝色突出显示。0是起始状态。

创建自己的语法

首先创建起始状态。

$fsm->addStart(0);

然后创建几个中间状态

$fsm->addState(1);
$fsm->addState(2);

并添加一个最终状态,该状态具有回调函数,用于处理子串并将令牌添加到令牌表中。最后一个参数告诉词法分析器在移动到初始状态时是否要取下一个字符。

$keywords = [
    'program', 'var', 'begin', 
    'read', 'write', 'repeat', 
    'until', 'if', 'then', 'fi'
];
$types = ['int', 'real', 'bool'];
$boolConstants = ['true', 'false'];

$fsm->addFinalState(-2, 
    static function (LexerWriterInterface $writer, string $string, int $line) 
    use ($keywords, $types, $boolConstants) {
        $index = null;
        $string = substr($string, 0, -1);
        if (in_array($string, $keywords, true)) {
            $token = 'Keyword';
        } elseif (in_array($string, $types, true)) {
            $token = 'Type';
        } elseif (in_array($string, $boolConstants, true)) {
            $token = 'BoolConst';
        } else {
            $token = 'Ident';
            $index = $writer->addIdentifier($string);
        }

        $writer->addToken($line, $string, $token, $index);
    }, false);

$fsm->addFinalState('error', 
    static function (LexerWriterInterface $writer, string $string, int $line) {
        throw new LexerException('Unknown char.', $string, $line);
    });

然后添加触发器(图的边)

$fsm->addTrigger(TriggerTypes::LETTER, 0, 1);
$fsm->addTrigger(FSM::DEFAULT_STATE, 0, 'error');

$fsm->addTrigger(TriggerTypes::LETTER, 1, 1);
$fsm->addTrigger(FSM::DEFAULT_STATE, 1, -2);
$fsm->addTrigger(TriggerTypes::DIGIT, 1, 2);

$fsm->addTrigger(FSM::DEFAULT_STATE, 2, -2);
$fsm->addTrigger(TriggerTypes::LETTER, 2, 2);
$fsm->addTrigger(TriggerTypes::DIGIT, 2, 2);

并显示状态图

$fsm->visualize();

gra283B tmp

词法分析器

要创建令牌、常量和标识符的表,您需要创建接收CharStreamInterface和FSM以及我们语法的Lexer类。

$stream = new FileStream('example.z99'); // implements CharStreamInterface
$lexer = new Lexer($stream, $fsm);

并运行tokenize()方法

try {
    $lexer->tokenize();

    foreach ($lexer->getTokens() as $token) {
        echo $token . PHP_EOL;
    }

    foreach ($lexer->getConstants() as $const) {
        echo $const . PHP_EOL;
    }

    foreach ($lexer->getIdentifiers() as $identifier) {
        echo $identifier . PHP_EOL;
    }

} catch (LexerException $e) {
    echo $e->getMessage() .
        "\n With string: '" . $e->getString() . '\'' .
        "\n in line " . $e->getLine();
}