dagstuhl/swh-archive-client

PHP的SoftwareHeritage Web API客户端

1.0 2024-09-06 00:00 UTC

This package is not auto-updated.

Last update: 2024-09-25 07:41:00 UTC


README

PHP的SoftwareHeritage Web API客户端

此项目提供了一个围绕SoftwareHeritage Web API的PHP包装器。Dagstuhl Publishing使用它来将其作者的软件项目自动归档到其出版工作流程中。

https://github.com/dagstuhl-publishing/swh-deposit-client,我们还提供了一个用于SoftwareHeritage Deposit API的PHP客户端。

安装

composer require dagstuhl/swh-archive-client

客户端设计为与laravel的配置机制无缝工作。在laravel项目中使用时,您可以通过在config文件夹中创建一个包含以下内容的文件swh.php来配置它

<?php

return [
    'web-api' => [
        'token' => env('SWH_WEB_API_TOKEN'),
        'url' => env('SWH_WEB_API_URL'),
        'cache-folder' => env('SWH_WEB_API_CACHE_FOLDER'), // absolute path to cache folder
        'cache-ttl' => env('SWH_WEB_API_CACHE_TTL'),
    ]
];

基于此配置,一个默认客户端被初始化并用于您请求SwhObject时。在非laravel环境中,只需实现一个全局配置函数config,使config('swh.web-api.token')成为您的令牌,config('swh.web-api.url')是API URL,依此类推。

重要提示:为了减少流量,有关已归档存储库的请求将被缓存,请参阅下文。

代码示例

1) 浏览存档

在存档中进行搜索非常直观,因为您不会意识到您正在处理一个API。您只需请求相关的对象,例如

// start with a url
$repo = Repository::fromNodeUrl('https://github.com/dagstuhl-publishing/styles');

// create the corresponding origin object
$origin = Origin::fromRepository($repo);

// ask the origin for the SoftwareHeritage visits
$visits = $origin->getVisits();

// get the snapshot object from a specific visit 
$snapshot = $visits[0]->getSnapshot();

// get the list of branches from a snapshot
$branches = $snapshot->getBranches();

进一步支持的对象有RevisionReleaseDirectoryContent。要按其id获取对象,只需调用byId方法

$revision = Revision::byId('60476b518914683d35ef08dd6cfdc7809e280c75');

要识别快照中的目录/文件,请使用Context类。在上面的示例中,我们可以做以下操作

// take the last snapshot
$snapshot = $visits[0]->getSnapshot();

// take a "path" to a file/directory inside the repo
$repoNode = new RepositoryNode('https://github.com/dagstuhl-publishing/styles/blob/master/LIPIcs/authors/lipics-v2021.cls');

// identify this node inside the snapshot (i.e., get the context) 
$context = $snapshot->getContext($repoNode);

// display the full identifier
dd($context->getIdentifier());

2) 归档存储库

  • 在第一步中,必须创建一个SaveRequest
$swhClient = SwhWebApiClient::getCurrent();

// create a repository instance from a url that points to a repo or a specific file/directory inside the repo
$repo = Repository::fromNodeUrl('https://github.com/.../...');

// submit a save request to Software Heritage 
$origin = Origin::fromRepository($repo);
$saveRequest = $origin->postSaveRequest();

if ($saveRequest === null) {
    // connection or network error
    dd('Internal server error', $swhClient->getException(), $swhClient->getLastResponse());
}
else {
    dd('SaveRequest created by SoftwareHeritage, SaveRequestId: '.$saveRequest->id);
    // store $saveRequest->id in local DB to track the status of this request
}
  • 在第二步中,必须监视SaveRequest的状态(在循环/cron-job中)。$saveRequestId是第一步结束时获得的id。
$saveRequest = SaveRequest::byId($saveRequestId)

if ($saveRequest->saveRequestStatus == SaveRequestStatus::REJECTED) {
    dd('save request rejected -> abort');
}
elseif ($saveRequest->saveTaskStatus == SaveTaskStatus::SUCCEEDED) {
    if ($saveRequest->snapshotSwhId === null) {
        dd('no snapshot though request succeeded -> this should actually not happen');
    }
    else {
        $snapshot = $saveRequest->getSnapshot();
        $repoNode = new RepositoryNode($repoNodeUrl ?? $saveRequest->originUrl);
        $context = $snapshot->getContext($repoNode);
        dd('success', $snapshot, $context, $context->getIdentifier());
    }
}
else {
    dd('pending -> loop this code block again', $saveRequest);
}

3) 错误处理

如果返回的是null而不是请求类型的对象,则表示发生了错误。可以从当前SwhWebApiClient实例中获取有关错误的更多信息,如下所示

$snapshot = Snapshot::byId('non-existing-or-invalid-id'); // to provoke an error 

if ($snapshot === null) {
    $swhClient = SwhWebApiClient::getCurrent();
    dd(
        $swhClient->getException(),     // last exception (e.g., in case of a network issue)
        $swhClient->getLastResponse()   // access the last HTTP response (incl. status code, headers) for debugging purposes 
    );
}

4) 缓存和速率限制

为了减少请求数量,除了/origin//stat/counters/端点之外的所有请求都被缓存。缓存文件夹必须指定为绝对路径在config('swh.web-api.cache-folder')中。要清除缓存,您可以使用clearCache命令

$swhClient = SwhWebApiClient::getCurrent();
$swhClient->clearCache('2024-09-07'); // clears the cache for a specific date
$swhClient->clearCache(); // clears the whole cache

要获取有关您的速率限制的信息,请调用$swhClient->getRateLimits();。这将返回一个以下类型的数组

[
    'X-RateLimit-Limit' => 1200,      // max. number of permitted requests per hour
    'X-RateLimit-Remaining' => 1138,  // remaining in current period
    'X-RateLimit-Reset' => 1620639052 // at this timestamp, the rate-limit will be refreshed
]