wangoviridans / tesseract_ocr
一个在PHP脚本中与TesseractOCR一起工作的包装器
dev-master
2014-07-26 13:40 UTC
Requires
- adambrett/shell-wrapper: 0.4.*
- wangoviridans/config: dev-master
Requires (Dev)
- phpunit/phpunit: 4.1.*
This package is not auto-updated.
Last update: 2024-09-24 03:04:36 UTC
README
一个在PHP脚本中与TesseractOCR一起工作的包装器。基于 https://github.com/wangoviridans/tesseract-ocr-for-php
安装
通过 composer (https://packagist.org.cn/packages/wangoviridans/tesseract_ocr)
{
"require": {
"wangoviridans/tesseract_ocr": ">= 0.0.1"
}
}
或者直接克隆并将其放在项目文件夹的某个地方。
$ cd myapp/vendor
$ git clone git://github.com/wangoviridans/tesseract-ocr-for-php.git
依赖项
重要:请确保 tesseract 二进制文件位于您的 $PATH 中。如果您在Web服务器上运行PHP,用户可能不是您,而是 _www 或类似用户。如果需要,您始终可以修改您的 $PATH
$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin");
用法
基本用法
<?php
require_once '/path/to/src/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer
$tesseract = new TesseractOCR();
$tesseract->setImage('images/some-words.jpg');
echo $tesseract->recognize();
or
<?php
require_once '/path/to/src/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer
$tesseract = new TesseractOCR(array(
'file.input' => 'images/some-words.jpg'
));
echo $tesseract->recognize();
定义语言
Tesseract为几种语言提供了训练数据,这无疑提高了识别的准确性。
<?php
require_once '/path/to/src/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer
$tesseract = new TesseractOCR('images/sind-sie-deutsch.jpg');
$tesseract->setLanguage('deu'); //same 3-letters code as tesseract training data packages
echo $tesseract->recognize();
or
<?php
require_once '/path/to/src/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer
$tesseract = new TesseractOCR(array(
'file.input' => 'images/sind-sie-deutsch.jpg',
'language' => 'deu' //same 3-letters code as tesseract training data packages
));
echo $tesseract->recognize();
诱导识别
有时tesseract会误解一些字符,例如
0 - O
1 - l
j - ,
etc ...
但您可以通过指定您发送的字符类型来提高识别准确性,例如
<?php
$tesseract = new TesseractOCR();
$tesseract->setImage('my-image.jpg')
->setWhitelist(range('a','z')); //tesseract will threat everything as downcase letters
echo $tesseract->recognize();
$tesseract = new TesseractOCR();
$tesseract->setImage('my-image.jpg')
->setWhitelist(range('A','Z'), range(0,9), '_-@.'); //you can pass as many ranges as you need
echo $tesseract->recognize();
您甚至可以做一些 酷 的事情,比如这个
<?php
$tesseract = new TesseractOCR();
$tesseract->setImage('my-image.jpg')
->setWhitelist(range('A','Z'));
echo $tesseract->recognize(); //will return "GIT"
故障排除
类似 权限被拒绝 或 没有找到文件或目录 的警告
要解决这个问题,您可以指定一个自定义的临时文件目录
<?php
$tesseract = new TesseractOCR();
$tesseract->setImage('my-image.jpg')
->setTempDir('./my-temp-dir');
or
<?php
$tesseract = new TesseractOCR(array(
'file.input' => 'my-image.jpg',
'tempDir' => './my-temp-dir'
));