marcelomx/tika-client

此包已被废弃,不再维护。未建议替代包。

Apache Tika PHP 客户端

dev-master 2013-09-24 19:43 UTC

This package is not auto-updated.

Last update: 2020-09-18 18:34:18 UTC


README

用法

$path   = __DIR__ . '/../bin/tika-app-1.4.jar'; 
$tika = new TikaClient($path);

// Get text
$text = $tika->getText('file.doc');

// Get html text
$html = $tika->getHtml('file.doc');

// Get xhtml text
$xhtml = $tika->getXhtml('file.doc');

// Get language
$lang = $tika->getLanguage('file.doc');

// Get content type
$type = $tika->getContentType('file.doc');

// Extract all attachments on doc file
$target = '/tmp/'; // target directory
$tika->extract('file.doc', $target);

如果您喜欢,可以使用 TikaWrapper 封装所有操作到同一个文件中。例如

$wrapper = new TikaWrapper('file.doc', $client);

// Get text
$text = $wrapper->getText();

// Get html text
$html = $wrapper->getHtml();

// Get xhtml text
$xhtml = $wrapper->getXhtml();

// Get language
$lang = $wrapper->getLanguage();

// Get content type
$type = $wrapper->getContentType('file.doc');