workable-cv / extract
打包提取PDF文件。
Requires
- php: >=7
- dompdf/dompdf: 0.8.5
- illuminate/support: ^5.5|^6
- knplabs/knp-snappy: 1.2
- pelago/emogrifier: ^2.0.0
- symfony/process: 4.3.4
Requires (Dev)
- ext-com_dotnet: *
- ext-dom: *
- ext-gd: *
- dev-master
- v1.0.48
- v1.0.47
- v1.0.46
- v1.0.45
- v1.0.44
- v1.0.43
- v1.0.42
- v1.0.41
- v1.0.40
- v1.0.39
- v1.0.38
- v1.0.37
- v1.0.36
- v1.0.35
- v1.0.34
- v1.0.33
- v1.0.32
- v1.0.31
- v1.0.30
- v1.0.29
- v1.0.28
- v1.0.27
- v1.0.26
- v1.0.25
- v1.0.24
- v1.0.23
- v1.0.22
- v1.0.21
- v1.0.20
- v1.0.19
- v1.0.18
- v1.0.17
- v1.0.16
- v1.0.15
- v1.0.14
- v1.0.13
- v1.0.12
- v1.0.11
- v1.0.10
- v1.0.9
- v1.0.8
- v1.0.7
- v1.0.6
- v1.0.5
- v1.0.4
- v1.0.3
- v1.0.2
- v1.0.1
- v1.0.0
This package is auto-updated.
Last update: 2024-09-08 12:34:29 UTC
README
为Laravel提供PDF转HTML、HTML转PDF、DOC转PDF、HTML转图片的功能
在composer.json中添加此包并更新composer。
composer require workable-cv/extract
安装
更新composer后,将ServiceProvider添加到config/app.php文件中的providers数组中
1. Laravel
WorkableCV\Extract\ExtractServiceProvider::class
2. Lumen
$app->register(\WorkableCV\Extract\ExtractServiceProvider::class);
配置
1. Laravel
php artisan vendor:publish --provider="WorkableCV\Extract\ExtractServiceProvider"
2. Lumen
- 第一步
composer require laravelista/lumen-vendor-publish
-
第二步
为了使用它,您必须将其添加到您的
app/Console/Kernel.php
文件中protected $commands = [ \Laravelista\LumenVendorPublish\VendorPublishCommand::class ];
-
最后一步
php artisan vendor:publish --provider="WorkableCV\Extract\ExtractServiceProvider"
Laravel 5.x
I. PDF转HTML
1. 安装Poppler-Utils
Debian/Ubuntu
sudo apt-get install poppler-utils
Mac OS X
brew install poppler
Window
对于需要在Windows上使用此包的用户,有办法。首先在此处下载poppler-utils for windows http://blog.alivate.com.au/poppler-windows/。并下载最新的二进制文件。
下载后,解压它。
2. 我们需要知道工具所在的位置
Debian/Ubuntu
$ whereis pdftohtml
pdftohtml: /usr/bin/pdftohtml
$ whereis pdfinfo
pdfinfo: /usr/bin/pdfinfo
Mac OS X
/usr/local/bin/pdfinfo
$ which pdftohtml
/usr/local/bin/pdfinfo
Window
进入解压目录。将有一个名为bin
的目录。我们需要这个。
3. PHP配置,启用shell访问
用法
在存储文件夹中创建文件文件夹
示例
use WorkableCV\Extract\core\PdfToHtml; $name_file = 'cv24'; $file = $name_file . ".pdf"; $id = uniqid(); $options = [ 'pdftohtml_path' => '/usr/bin/pdftohtml', 'pdfinfo_path' => '/usr/bin/pdfinfo' 'clearAfter' => false, 'outputDir' => storage_path('files/'.$id), ]; //example for Window //$options = [ // 'pdftohtml_path' => '/path/to/poppler/bin/pdftohtml.exe', // 'pdfinfo_path' => '/path/to/poppler/bin/pdfinfo.exe', // 'clearAfter' => false, // 'outputDir' => storage_path('files/'.$id), // ]; $pdf = new PdfToHtml($file, $options, 'pdf'); $pdf->generateHTML($name_file, $id);
完整选项
$full_settings = [ 'pdftohtml_path' => '/usr/bin/pdftohtml', // path to pdftohtml 'pdfinfo_path' => '/usr/bin/pdfinfo', // path to pdfinfo 'generate' => [ // settings for generating html 'singlePage' => false, // we want separate pages 'imageJpeg' => false, // we want png image 'ignoreImages' => false, // we need images 'zoom' => 1.5, // scale pdf 'noFrames' => false, // we want separate pages ], 'clearAfter' => true, // auto clear output dir (if removeOutputDir==false then output dir will remain) 'removeOutputDir' => true, // remove output dir 'outputDir' => '/tmp/'.uniqid(), // output dir 'html' => [ // settings for processing html 'inlineCss' => true, // replaces css classes to inline css rules 'inlineImages' => true, // looks for images in html and replaces the src attribute to base64 hash 'onlyContent' => true, // takes from html body content only ] ]
II. HTML转PDF
用法
示例
use WorkableCV\Extract\core\HtmlToPdf; $html = new HtmlToPdf(); $file = 'test.html'; $option = [ 'dpi' => 120 ]; $pdf_generate = $html->generatePDF($path_file, $option); return $pdf_generate;
完整选项
rootDir: "{app_directory}/vendor/dompdf/dompdf"
tempDir: "/tmp" (available in config/dompdf.php)
fontDir: "{app_directory}/storage/fonts/" (available in config/dompdf.php)
fontCache: "{app_directory}/storage/fonts/" (available in config/dompdf.php)
chroot: "{app_directory}" (available in config/dompdf.php)
logOutputFile: "/tmp/log.htm"
defaultMediaType: "screen" (available in config/dompdf.php)
defaultPaperSize: "a4" (available in config/dompdf.php)
defaultFont: "serif" (available in config/dompdf.php)
dpi: 96 (available in config/dompdf.php)
fontHeightRatio: 1.1 (available in config/dompdf.php)
isPhpEnabled: false (available in config/dompdf.php)
isRemoteEnabled: true (available in config/dompdf.php)
isJavascriptEnabled: true (available in config/dompdf.php)
isHtml5ParserEnabled: false (available in config/dompdf.php)
isFontSubsettingEnabled: false (available in config/dompdf.php)
debugPng: false
debugKeepTemp: false
debugCss: false
debugLayout: false
debugLayoutLines: true
debugLayoutBlocks: true
debugLayoutInline: true
debugLayoutPaddingBox: true
pdfBackend: "CPDF" (available in config/dompdf.php)
pdflibLicense: ""
adminUsername: "user"
adminPassword: "password"
III. HTML转图像
用法
在浏览器中复制结构下的html文件路径$id .'_image.html ' ($id由pdf转html生成)
并运行文件。打开F12,切换到控制台标签将会有以base64为底图的图像
IV. DOC转PDF
用法
参考https://php.ac.cn/manual/en/book.com.php
要求
-
Window
-
添加到php.ini
[PHP_COM_DOTNET] extension=php_com_dotnet.dll
-
-
macOs, Linux
snap install libreoffice sudo add-apt-repository ppa:libreoffice/ppa sudo apt-get update sudo apt-get install libreoffice sudo apt-get update -y sudo apt-get install -y libreoffice-java-common sudo apt-get install default-jre libreoffice-java-common
示例
use WorkableCV\Extract\core\DocToPdf; //test.doc is a convert to pdf file. //test.pdf is the path to save the test.pdf file $doc = new DocToPdf(); //window $doc->generatePDF('test.doc', 'test.pdf'); //linux, masOs $doc->generatePDFLinux('test.doc');
V. PDF OCR
用法
安装OCRmyPDF : https://ocrmypdf.cn/en/latest/
注意
- 支持Linux、macOs。不支持window,因为与权限有关。
示例
use WorkableCV\Extract\core\PdfOCR; //test.pdf is a convert to path pdf file. //test.ocr.pdf is the path pdf file to ouput $pdfOCR = new PdfOCR(); $pdfOCR->pdfOCR('test.pdf', 'test.ocr.pdf');
VI. PDF受保护
注意
-
在任何人可以这样做之前,您必须阅读上述内容并安装上述所有所需的包。
-
支持Linux、macOs。不支持window,因为与权限有关。
-
所有配置选项都写在extract.php文件中
示例
use WorkableCV\Extract\core\PdfOCR; use WorkableCV\Extract\core\PdfProtected; use WorkableCV\Extract\core\PdfToHtml; $name_file = 'cv105'; $file = storage_path('cv1/' . $name_file . '.pdf'); $options_check = config('extract.options_extract'); $pdf = new PdfToHtml($file, $options_check); $output_dir = config('extract.options_extract.outputDir'); //Check pdf file scan from dom element or image $checkPdf = $pdf->checkPdf($output_dir, $name_file); if ($checkPdf) { $pdfOCR = new PdfOCR(); $result_pdfOCR = $pdfOCR->pdfOCR($file); if (!$result_pdfOCR) exit('File not convert. Try again'); $path_file_ocr = $result_pdfOCR[1]; $pdfProtected = new PdfProtected($path_file_ocr, $options_check); $path_cv_protected = $pdfProtected->pdfProtected($name_file, $output_dir, false, false, 'pdf', 1, config('extract.output_cv_protected'), false); unlink($path_file_ocr); } else { //Check pdf file exist email or phone $checkExistEmailPhone = $pdf->checkExitsEmailPhone($output_dir, $name_file); if (!$checkExistEmailPhone) { $pdfOCR = new PdfOCR(); $result_pdfOCR = $pdfOCR->pdfOCR($file); if (!$result_pdfOCR) exit('File not convert. Try again'); $path_file_ocr = $result_pdfOCR[1]; $pdfProtected = new PdfProtected($path_file_ocr, $options_check); $path_cv_protected = $pdfProtected->pdfProtected($name_file, $output_dir, false, false, 'pdf', 1, config('extract.output_cv_protected'), false); unlink($path_file_ocr); } else { $pdfProtected = new PdfProtected($file, $options_check); $path_cv_protected = $pdfProtected->pdfProtected($name_file, $output_dir, false, false, 'pdf', 1, config('extract.output_cv_protected'), false); } } return $path_cv_protected;
VII. 许可证
此Extract Pdf for Laravel是开源软件,根据MIT许可证授权