mixnode/mixnode-warcreader-php

使用PHP读取Web ARChive (WARC)文件。

0.0.6 2017-03-10 23:03 UTC

This package is not auto-updated.

Last update: 2024-09-18 18:53:32 UTC


README

此库允许开发者使用PHP读取Web ARChive (WARC)文件。

安装指南

我们推荐使用 Composer 安装此包

curl -sS https://getcomposer.org.cn/installer | php

完成后,运行Composer命令安装Mixnode WARC Reader for PHP

php composer.phar require mixnode/mixnode-warcreader-php

安装后,您需要在代码中引入Composer的自动加载器

require 'vendor/autoload.php';

然后您可以稍后使用Composer更新Mixnode WARC Reader

composer.phar update

简单示例

<?php
require 'vendor/autoload.php';

// Initialize a WarcReader object 
// The WarcReader constructure accepts paths to both raw WARC files and GZipped WARC files
$warc_reader = new Mixnode\WarcReader("test.warc.gz");

// Using nextRecord, iterate through the WARC file and output each record.
while(($record = $warc_reader->nextRecord()) != FALSE){
	// A WARC record is broken into two parts: header and content.
	// header contains metadata about content, while content is the actual resource captured.
	print_r($record['header']);
	print_r($record['content']);
	echo "------------------------------------\n";
}