README

这是一个用于过滤谷歌驱动中PDF文档的PHP库，为Daniel Fischl编写。

要将此导入到项目中，请使用composer。

composer require tiefan/google-pdf-scraper

从PDF文档中提取文本

$text = PdfScraper::textFromDriveId(string $fileId);

$text = PdfScraper::textFromDriveUrl(string $url);

使用“开始”和“结束”关键词检查文档

$isThatDocument = PdfScraper::checkKeywordsFromDriveId(string $fileId, string $begin, string $end = null);

$isThatDocument = PdfScraper::checkKeywordsFromDriveUrl(string $url, string $begin, string $end = null);

$scraper = new PdfScraper($doc, $isURL = true); // $isURL: true for url, false for id
$isThatDocument = $scraper->checkKeywords(string $begin, string $end = null);

使用MySQL或MariaDB一次性处理数据

以下代码使用Sample\db_pdf_scraper.sql中的db架构

$pdfDB = new PdfDB($host, $username, $password, $database);
$processed_count = $pdfDB->checkPdfs();

tiefan / google-pdf-scraper

维护者

详细信息

README

从PDF文档中提取文本

使用“开始”和“结束”关键词检查文档

使用MySQL或MariaDB一次性处理数据