README

LLM完成输入

最好的LLM可以按照你提供的模式输出JSON，通常是JSON-Schema。这大大扩展了你在应用程序中利用LLM的方式！

将输入视为

上下文，任何可以或可以转换为文本的内容，如电子邮件/PDFs/HTML/xlsx
模式，"这里是完成任务的表格填写格式"
可选的提示，给出特定的任务，规则等

输出/结果是根据你的用例和领域最佳匹配的结构。

python instructor cookbook 有一些有趣的例子。

介绍

Instructrice是一个PHP库，以类型安全的方式简化了与LLM结构化输出的工作。

功能

灵活的模式选项
- 使用 api-platform/json-schema 的类
- 动态生成的类型 PSL\Type
- 或由第三方库生成的JSON-Schema数组，或纯PHP
与 symfony/serializer 集成以反序列化LLM输出
首先流式传输
- 作为开发者，你可以通过更快的反馈循环而不是等待输出完成来提高生产力。这也使得较慢的本地模型更易于使用。
- 你可以为用户提供更好的、更快的用户体验。
- 解析不完整JSON的烦恼由你处理。
一组预配置的LLM，具有最佳可用设置。设置你的API密钥，在不同的提供者和模型之间切换，无需考虑模型名称、json模式、函数调用等。

还有一个 Symfony Bundle 可用。

安装和用法

composer require kargnas/instructrice

use AdrienBrault\Instructrice\InstructriceFactory;
use AdrienBrault\Instructrice\LLM\Provider\Ollama;
use AdrienBrault\Instructrice\LLM\Provider\OpenAi;
use AdrienBrault\Instructrice\LLM\Provider\Anthropic;

$instructrice = InstructriceFactory::create(
    defaultLlm: Ollama::HERMES2THETA_LLAMA3_8B,
    apiKeys: [ // Unless you inject keys here, api keys will be fetched from environment variables
        OpenAi::class => $openAiApiKey,
        Anthropic::class => $anthropicApiKey,
    ],
);

对象列表

use AdrienBrault\Instructrice\Attribute\Prompt;

class Character
{
    // The prompt annotation lets you add instructions specific to a property
    #[Prompt('Just the first name.')]
    public string $name;
    public ?string $rank = null;
}

$characters = $instructrice->getList(
    Character::class,
    'Colonel Jack O\'Neil walks into a bar and meets Major Samanta Carter. They call Teal\'c to join them.',
);

/*
dump($characters);
array:3 [
  0 => Character^ {
    +name: "Jack"
    +rank: "Colonel"
  }
  1 => Character^ {
    +name: "Samanta"
    +rank: "Major"
  }
  2 => Character^ {
    +name: "Teal'c"
    +rank: null
  }
]
*/

对象

$character = $instructrice->get(
    type: Character::class,
    context: 'Colonel Jack O\'Neil.',
);

/*
dump($character);
Character^ {
  +name: "Jack"
  +rank: "Colonel"
}
*/

动态模式

$label = $instructrice->get(
    type: [
        'type' => 'string',
        'enum' => ['positive', 'neutral', 'negative'],
    ],
    context: 'Amazing great cool nice',
    prompt: 'Sentiment analysis',
);

/*
dump($label);
"positive"
*/

您还可以使用第三方json模式库，如 goldspecdigital/oooas 来生成模式

examples/oooas.php

CleanShot.2024-04-18.at.14.11.39.mp4

支持的提供者

支持的提供者是枚举，您可以将其传递到 InstructriceFactory::create 的 llm 参数中

use AdrienBrault\Instructrice\InstructriceFactory;
use AdrienBrault\Instructrice\LLM\Provider\OpenAi;

$instructrice->get(
    ...,
    llm: OpenAi::GPT_4T, // API Key will be fetched from the OPENAI_API_KEY environment variable
);

支持的模型

开放权重

基础

来自 https://artificialanalysis.ai/leaderboards/providers 的吞吐量。

微调

专有

来自 https://artificialanalysis.ai/leaderboards/providers 的吞吐量。

通过抓取 https://artificialanalysis.ai ，以及 chatboard arena elo. 自动更新这些表格？这将是一个很好的库/cli用例/展示？

自定义模型

Ollama

如果您想使用不在枚举中的Ollama模型，您可以使用 Ollama::create 静态方法

use AdrienBrault\Instructrice\LLM\LLMConfig;
use AdrienBrault\Instructrice\LLM\Cost;
use AdrienBrault\Instructrice\LLM\OpenAiJsonStrategy;
use AdrienBrault\Instructrice\LLM\Provider\Ollama;

$instructrice->get(
    ...,
    llm: Ollama::create(
        'codestral:22b-v0.1-q5_K_M', // check its license first!
        32000,
    ),
);

OpenAI

您还可以通过传递 LLMConfig 使用任何OpenAI兼容的api

use AdrienBrault\Instructrice\LLM\LLMConfig;
use AdrienBrault\Instructrice\LLM\Cost;
use AdrienBrault\Instructrice\LLM\OpenAiJsonStrategy;

$instructrice->get(
    ...,
    llm: new LLMConfig(
        uri: 'https://api.together.xyz/v1/chat/completions',
        model: 'meta-llama/Llama-3-70b-chat-hf',
        contextWindow: 8000,
        label: 'Llama 3 70B',
        provider: 'Together',
        cost: Cost::create(0.9),
        strategy: OpenAiJsonStrategy::JSON,
        headers: [
            'Authorization' => 'Bearer ' . $apiKey,
        ]
    ),
);

DSN

您可以使用DSN配置LLM。

方案是提供者：openai、openai-http、anthropic、google。
密码是API密钥。
主机、端口和路径是API端点（不带方案）。
查询字符串。
- model是模型名称。
- context是上下文窗口。
- strategy是要使用的策略。
  - json为仅包含方案的提示中的JSON模式。
  - json_with_schema为可能将完成内容完美约束到方案的JSON模式。
  - tool_any
  - tool_auto
  - tool_function

示例

use AdrienBrault\Instructrice\InstructriceFactory;

$instructrice = InstructriceFactory::create(
    defaultLlm: 'openai://:api_key@api.openai.com/v1/chat/completions?model=gpt-3.5-turbo&strategy=tool_auto&context=16000'
);

$instructrice->get(
    ...,
    llm: 'openai-https://:11434?model=adrienbrault/nous-hermes2theta-llama3-8b&strategy=json&context=8000'
);

$instructrice->get(
    ...,
    llm: 'openai://:api_key@api.fireworks.ai/inference/v1/chat/completions?model=accounts/fireworks/models/llama-v3-70b-instruct&context=8000&strategy=json_with_schema'
);

$instructrice->get(
    ...,
    llm: 'google://:api_key@generativelanguage.googleapis.com/v1beta/models?model=gemini-1.5-flash&context=1000000'
);

$instructrice->get(
    ...,
    llm: 'anthropic://:api_key@api.anthropic.com?model=claude-3-haiku-20240307&context=200000'
);

LLMInterface

您还可以实现LLMInterface。

致谢

显然受到了instructor-php和instructor的启发。

它和instructor php有什么不同？

这两个库本质上做的是同一件事。

从类自动生成方案。
多个LLM/提供者抽象/支持。
许多提取数据的方法：函数调用、JSON模式等。
自动反序列化/初始化。
也许这个库以后会有验证/重试功能。

然而，instructice与之不同。

以流为主。
预配置提供者+llms，无需担心
- JSON模式、函数调用等。
- 最佳提示格式。
- 您对本地模型的选项。
- 流是否有效。例如，groq只能进行流式处理，而不能使用json-mode/函数调用。
PSR-3日志。
Guzzle+symfony/http-client支持。
没有消息。您只需传递上下文、提示。
- 我希望这个选择能支持以后的一些酷功能，如支持few-shots示例、评估等。
更灵活的方案选项。
更高层次的抽象。您不能提供消息列表，而使用instructor-php则可以。

备注/想法

需要关注的事情

非结构化
Llama Parse
EMLs
jina-ai/reader -> 这很酷，$client->request('GET', 'https://r.jina.ai/' . $url)
firecrawl

DSPy非常有趣。有许多值得启发的伟大想法。

理想情况下，这个库非常适合进行原型设计，但也可以支持更高级的提取工作流程，例如使用few-shot示例、某种评估系统、生成类似于DSPy的样本/输出等。

有一个CLI将很酷，它接受FQCN和上下文。

instructrice get "App\Entity\Customer" "$(cat some_email_body.md)"

自动将所有输入/方案/输出保存到sqlite数据库中。像llm？利用它来测试示例、添加few-shot、评估？

kargnas / 指导员

维护者

详细信息