webscraping-ai / webscraping-ai-php
WebScraping.AI 爬虫 API 提供了基于 GPT 的工具,具有 Chromium JavaScript 渲染、轮换代理和内置 HTML 解析功能。
dev-master
2024-01-04 02:56 UTC
Requires
- php: ^7.4 || ^8.0
- ext-curl: *
- ext-json: *
- ext-mbstring: *
- guzzlehttp/guzzle: ^7.3
- guzzlehttp/psr7: ^1.7 || ^2.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.5
- phpunit/phpunit: ^8.0 || ^9.0
This package is auto-updated.
Last update: 2024-09-04 04:47:12 UTC
README
WebScraping.AI 爬虫 API 提供了基于 GPT 的工具,具有 Chromium JavaScript 渲染、轮换代理和内置 HTML 解析功能。
更多信息请访问 https://webscraping.ai.
安装与使用
要求
PHP 7.4 及以上版本。也应适用于 PHP 8.0。
Composer
要使用 Composer 安装绑定,请将以下内容添加到 composer.json
{
"repositories": [
{
"type": "vcs",
"url": "https://github.com/webscraping-ai/webscraping-ai-php.git"
}
],
"require": {
"webscraping-ai/webscraping-ai-php": "*@dev"
}
}
然后运行 composer install
手动安装
下载文件并包含 autoload.php
<?php require_once('/path/to/WebScrapingAI/vendor/autoload.php');
入门指南
请按照 安装过程 进行操作,然后运行以下命令
<?php require_once(__DIR__ . '/vendor/autoload.php'); // Configure API key authorization: api_key $config = OpenAPI\Client\Configuration::getDefaultConfiguration()->setApiKey('api_key', 'YOUR_API_KEY'); // Uncomment below to setup prefix (e.g. Bearer) for API key, if needed // $config = OpenAPI\Client\Configuration::getDefaultConfiguration()->setApiKeyPrefix('api_key', 'Bearer'); $apiInstance = new OpenAPI\Client\Api\AIApi( // If you want use custom http client, pass your client which implements `GuzzleHttp\ClientInterface`. // This is optional, `GuzzleHttp\Client` will be used as default. new GuzzleHttp\Client(), $config ); $url = https://example.com; // string | URL of the target page. $question = What is the summary of this page content?; // string | Question or instructions to ask the LLM model about the target page. $context_limit = 4000; // int | Maximum number of tokens to use as context for the LLM model (4000 by default). $response_tokens = 100; // int | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit). $on_context_limit = truncate; // string | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long. $headers = {"Cookie":"session=some_id"}; // array<string,string> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}). $timeout = 10000; // int | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). $js = true; // bool | Execute on-page JavaScript using a headless browser (true by default). $js_timeout = 2000; // int | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. $proxy = datacenter; // string | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. $country = us; // string | Country of the proxy to use (US by default). Only available on Startup and Custom plans. $device = desktop; // string | Type of device emulation. $error_on_404 = false; // bool | Return error on 404 HTTP status on the target page (false by default). $error_on_redirect = false; // bool | Return error on redirect on the target page (false by default). $js_script = document.querySelector('button').click();; // string | Custom JavaScript code to execute on the target page. try { $result = $apiInstance->getQuestion($url, $question, $context_limit, $response_tokens, $on_context_limit, $headers, $timeout, $js, $js_timeout, $proxy, $country, $device, $error_on_404, $error_on_redirect, $js_script); print_r($result); } catch (Exception $e) { echo 'Exception when calling AIApi->getQuestion: ', $e->getMessage(), PHP_EOL; }
API 端点
所有 URI 都相对于 https://api.webscraping.ai
模型
授权
API 定义的认证方案
api_key
- 类型: API 密钥
- API 密钥参数名: api_key
- 位置: URL 查询字符串
测试
要运行测试,请使用
composer install vendor/bin/phpunit
作者
关于此包
此 PHP 包由 OpenAPI Generator 项目自动生成
- API 版本:
3.1.3- 包版本:
3.1.3
- 包版本:
- 构建包:
org.openapitools.codegen.languages.PhpClientCodegen