dagstuhl / swh-archive-client
PHP的SoftwareHeritage Web API客户端
Requires
- php: ^8.1
- ext-json: *
- guzzlehttp/guzzle: ^7.9
- kevinrob/guzzle-cache-middleware: ^5.1
- league/flysystem: ^3.28
- nesbot/carbon: ^2||^3
This package is not auto-updated.
Last update: 2024-09-25 07:41:00 UTC
README
PHP的SoftwareHeritage Web API客户端
此项目提供了一个围绕SoftwareHeritage Web API的PHP包装器。Dagstuhl Publishing使用它来将其作者的软件项目自动归档到其出版工作流程中。
在https://github.com/dagstuhl-publishing/swh-deposit-client,我们还提供了一个用于SoftwareHeritage Deposit API的PHP客户端。
安装
composer require dagstuhl/swh-archive-client
客户端设计为与laravel的配置机制无缝工作。在laravel项目中使用时,您可以通过在config
文件夹中创建一个包含以下内容的文件swh.php
来配置它
<?php return [ 'web-api' => [ 'token' => env('SWH_WEB_API_TOKEN'), 'url' => env('SWH_WEB_API_URL'), 'cache-folder' => env('SWH_WEB_API_CACHE_FOLDER'), // absolute path to cache folder 'cache-ttl' => env('SWH_WEB_API_CACHE_TTL'), ] ];
基于此配置,一个默认客户端被初始化并用于您请求SwhObject时。在非laravel环境中,只需实现一个全局配置函数config
,使config('swh.web-api.token')
成为您的令牌,config('swh.web-api.url')
是API URL,依此类推。
重要提示:为了减少流量,有关已归档存储库的请求将被缓存,请参阅下文。
代码示例
1) 浏览存档
在存档中进行搜索非常直观,因为您不会意识到您正在处理一个API。您只需请求相关的对象,例如
// start with a url $repo = Repository::fromNodeUrl('https://github.com/dagstuhl-publishing/styles'); // create the corresponding origin object $origin = Origin::fromRepository($repo); // ask the origin for the SoftwareHeritage visits $visits = $origin->getVisits(); // get the snapshot object from a specific visit $snapshot = $visits[0]->getSnapshot(); // get the list of branches from a snapshot $branches = $snapshot->getBranches();
进一步支持的对象有Revision
、Release
、Directory
、Content
。要按其id获取对象,只需调用byId
方法
$revision = Revision::byId('60476b518914683d35ef08dd6cfdc7809e280c75');
要识别快照中的目录/文件,请使用Context
类。在上面的示例中,我们可以做以下操作
// take the last snapshot $snapshot = $visits[0]->getSnapshot(); // take a "path" to a file/directory inside the repo $repoNode = new RepositoryNode('https://github.com/dagstuhl-publishing/styles/blob/master/LIPIcs/authors/lipics-v2021.cls'); // identify this node inside the snapshot (i.e., get the context) $context = $snapshot->getContext($repoNode); // display the full identifier dd($context->getIdentifier());
2) 归档存储库
- 在第一步中,必须创建一个
SaveRequest
$swhClient = SwhWebApiClient::getCurrent(); // create a repository instance from a url that points to a repo or a specific file/directory inside the repo $repo = Repository::fromNodeUrl('https://github.com/.../...'); // submit a save request to Software Heritage $origin = Origin::fromRepository($repo); $saveRequest = $origin->postSaveRequest(); if ($saveRequest === null) { // connection or network error dd('Internal server error', $swhClient->getException(), $swhClient->getLastResponse()); } else { dd('SaveRequest created by SoftwareHeritage, SaveRequestId: '.$saveRequest->id); // store $saveRequest->id in local DB to track the status of this request }
- 在第二步中,必须监视
SaveRequest
的状态(在循环/cron-job中)。$saveRequestId
是第一步结束时获得的id。
$saveRequest = SaveRequest::byId($saveRequestId) if ($saveRequest->saveRequestStatus == SaveRequestStatus::REJECTED) { dd('save request rejected -> abort'); } elseif ($saveRequest->saveTaskStatus == SaveTaskStatus::SUCCEEDED) { if ($saveRequest->snapshotSwhId === null) { dd('no snapshot though request succeeded -> this should actually not happen'); } else { $snapshot = $saveRequest->getSnapshot(); $repoNode = new RepositoryNode($repoNodeUrl ?? $saveRequest->originUrl); $context = $snapshot->getContext($repoNode); dd('success', $snapshot, $context, $context->getIdentifier()); } } else { dd('pending -> loop this code block again', $saveRequest); }
3) 错误处理
如果返回的是null而不是请求类型的对象,则表示发生了错误。可以从当前SwhWebApiClient
实例中获取有关错误的更多信息,如下所示
$snapshot = Snapshot::byId('non-existing-or-invalid-id'); // to provoke an error if ($snapshot === null) { $swhClient = SwhWebApiClient::getCurrent(); dd( $swhClient->getException(), // last exception (e.g., in case of a network issue) $swhClient->getLastResponse() // access the last HTTP response (incl. status code, headers) for debugging purposes ); }
4) 缓存和速率限制
为了减少请求数量,除了/origin/
和/stat/counters/
端点之外的所有请求都被缓存。缓存文件夹必须指定为绝对路径在config('swh.web-api.cache-folder')
中。要清除缓存,您可以使用clearCache
命令
$swhClient = SwhWebApiClient::getCurrent(); $swhClient->clearCache('2024-09-07'); // clears the cache for a specific date $swhClient->clearCache(); // clears the whole cache
要获取有关您的速率限制的信息,请调用$swhClient->getRateLimits();
。这将返回一个以下类型的数组
[ 'X-RateLimit-Limit' => 1200, // max. number of permitted requests per hour 'X-RateLimit-Remaining' => 1138, // remaining in current period 'X-RateLimit-Reset' => 1620639052 // at this timestamp, the rate-limit will be refreshed ]