此包已被废弃,不再维护。没有建议的替代包。

基于PHP的Docx解析器

3.9 2020-08-27 11:25 UTC

This package is auto-updated.

Last update: 2023-08-26 20:20:18 UTC


README

基于PHP的Docx解析器

安装

Composer (命令行): composer require philgale92/docx:3.*

Composer (文件): 在您的 composer.json 文件中添加以下内容

    "require": {
        "philgale92/docx": "3.*"
    }

手动: src 目录内的文件遵循 PSR-0 格式。

支持

  • 段落(基本文本)
  • 文本属性(粗体、下划线、斜体、制表符、下标 & 上标)
  • 图片
  • 列表
  • 超链接
  • 表格(列跨、垂直合并单元格等)
  • Composer 支持
  • Word 样式
  • 自定义属性加载(见使用说明)
  • 文本框

使用方法

    
/*
* Create the parse object which converts the file into internalised objects
*/
$parser = new \PhilGale92Docx\Docx($absolutePathToDocxFile );

/*
 * Attach style info (if any)
 */
 $parser->addStyle(
     (new \PhilGale92Docx\Style())
     ->setStyleId('standardPara')
     ->setHtmlClass('custom')
     ->setHtmlTag('p') // 'p' is default behaviour 
 );


 /*
  * Here is an example of a MetaData attribute style 
  * (Lets pull out the titleStyle directly)
  */
$parser->addStyle(
    (new \PhilGale92Docx\Style())
    ->setStyleId('0TitleName')
    
    // By setting this as metaData, we can pull in 
    // any content where this style is used in a seperate call.
    // It also removes the content from the standard render 
    
    ->setIsMetaData(true)
    // By default the metaData is parsed as HTML
    // But if you need to get the literal plain text then we can do that too  
    ->setMetaDataRenderMode(\PhilGale92Docx\Docx::RENDER_MODE_PLAIN)
);

/*
 * Here is an example of a heading style
 */
$parser->addStyle(
    (new \PhilGale92Docx\Style())
    ->setStyleId('1HeadingStyle')
    ->setHtmlTag('h2')
    ->setHtmlClass('custom')
);

/*
* Here is an example where we want to wrap all adjacent styles
* of this name with a div
*/
$parser->addStyle(
    (new \PhilGale92Docx\Style())
    ->setStyleId('3Boxgreytint')
    ->setBoxSimilarSiblings(true) // enable boxing behaviour
    ->setBoxClassName('box-style-tint-grey') // class of wrapping div
);

/*
 * You can also create word styles that make text lists
 * this is in addition to the standard list detection
*/
$parser->addStyle(
    (new \PhilGale92Docx\Style())
    ->setStyleId('4Numberedlist')
    ->setListHtmlTag('ol') // Takes 'ul' or 'ol'. 'ul' is default behaviour 
    ->setListLevel(1) // the indentation level, must be > 0 
);

/*
 * Now parse the xml into internal objects
*/
$parser->parse(); // Optional, ran automatically by ->render() if not ran yet

/*
* Now render the parser into html string  
*/
echo $parser
    ->render(\PhilGale92Docx\Docx::RENDER_MODE_HTML)
;
/*
 * We can grab any metaData content after ->parse() is performed
 * ->getMetaData() runs ->parse() if not ran yet
*/
var_dump(
    $parser->getMetaData() // also takes $styleId of metaData as an argument
);

/*
 * Are there any styles we forgot to declare?
 */
var_dump($parser->getDetectedStyles());

推荐的CSS

以下是一些基本的CSS样式,可以作为起点。

table {
    border-collapse:collapse;
} 
th {
    text-align: left;
    text-transform: none;
}
td, th { 
    vertical-align:top;
    background-clip:padding-box;
    border:1px solid #000000;
    color: #414042;
    height: 34px;
    padding-left: 6px;
    position: relative;
}
td.has_subcell  {
    padding-left:0;
}
table table {
    width:100%;
}
td td {
    height:72px;  
    border:none;
    border-bottom:1px solid black; 
    min-width:110px;
} 
td table tr:last-of-type td {
    border-bottom:0;
}
span.indent {
    padding-left:36px;
} 

====

要求

  • PHP >= 5.4

====

更新内容(v1->v2)

  • 集成到Composer中(psr-0)
  • 重构架构以使其更容易维护,并正确实现面向对象。
  • 正确使用私有/保护/公共访问。
  • 删除了所有对象中的动态设置属性
  • 现在所有 domElements 都被平等对待,因此所有情况下都保留了顺序。
  • RenderMode 在整个过程中得到正确传播,因此现在更好地支持转换为其他格式。
  • 添加自定义标签渲染更容易处理
  • 预处理阶段不再使用过时的数组,因此更容易了解其工作原理。
  • 整理了整个PHPDocs