README

PDF to HTML PHP类

此类提供给您，以便您可以使用PHP和poppler-utils将PDF文件转换为HTML文件

重要说明

请参阅以下使用说明，因为它已经进行了升级，此包中的事物已经发生了变化。

安装

当您处于您的活动目录apps时，您只需运行此命令即可将此包添加到您的应用

	composer require gufy/pdftohtml-php:~2

或将此包添加到您的composer.json

{
	"gufy/pdftohtml-php":"~2"
}

要求

Poppler-Utils（如果您使用Ubuntu发行版，只需从apt安装它）sudo apt-get install poppler-utils
具有shell访问权限的PHP配置

使用

以下是示例。

<?php
// if you are using composer, just use this
include 'vendor/autoload.php';

// initiate
$pdf = new Gufy\PdfToHtml\Pdf('file.pdf');

// convert to html string
$html = $pdf->html();

// convert a specific page to html string
$page = $pdf->html(3);

// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$dom = $pdf->getDom();

// check if your pdf has more than one pages
$total_pages = $pdf->getPages();

// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$dom->goToPage(3);

// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $dom->find('body > p');

// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');

// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');
?>

###通过getDOM传递选项默认情况下，getDom()提取所有图像并创建每个页面的HTML文件。您可以在提取HTML时传递选项

<?php
$pdfDom = $pdf->getDom(['ignoreImages' => true]);

###可用选项

singlePage，默认：false
imageJpeg，默认：false
ignoreImages，默认：false
zoom，默认：1.5
noFrames，默认：true

针对Windows用户的用法说明

对于需要在Windows上使用此包的用户，有一个方法。首先在此处下载poppler-utils for windows http://blog.alivate.com.au/poppler-windows/。并下载最新二进制文件。

下载后，将其解压缩。将有一个名为bin的目录。我们需要这个目录。然后更改您的代码如下

<?php
// if you are using composer, just use this
include 'vendor/autoload.php';
use Gufy\PdfToHtml\Config;
// change pdftohtml bin location
Config::set('pdftohtml.bin', 'C:/poppler-0.37/bin/pdftohtml.exe');

// change pdfinfo bin location
Config::set('pdfinfo.bin', 'C:/poppler-0.37/bin/pdfinfo.exe');
// initiate
$pdf = new Gufy\PdfToHtml\Pdf('file.pdf');

// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$html = $pdf->html();

// check if your pdf has more than one pages
$total_pages = $pdf->getPages();

// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$html->goToPage(3);

// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $html->find('body > p');

?>

针对OS/X用户的用法说明

感谢@kaleidoscopique尝试并使此包在OS/X上运行

1. 安装brew

Brew是OS/X上著名的包管理器：https://brew.sh.cn/（aptitude风格）。

2. 安装poppler

brew install poppler

3. 验证pdfinfo和pdftohtml的路径

$ which pdfinfo
/usr/local/bin/pdfinfo

$ which pdftohtml
/usr/local/bin/pdfinfo

4. 无论路径是什么，都使用Gufy\PdfToHtml\Config::set在您的PHP代码中设置它们。显然，使用与which命令提供的相同路径；

<?php
// if you are using composer, just use this
include 'vendor/autoload.php';

// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');

// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');

// initiate
$pdf = new Gufy\PdfToHtml\Pdf('file.pdf');

// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$html = $pdf->html();
?>

反馈 & 贡献

发送给我一个改进或任何bug的问题。我喜欢帮助并解决其他人的问题。谢谢👍

gufy/ pdftohtml-php

维护者

详细信息