krane_mora/html2text

HTML到文本转换器

v1.0.0 2019-08-28 17:59 UTC

This package is auto-updated.

Last update: 2024-09-19 05:53:16 UTC


README

将HTML文档转换为纯文本。

安装

composer require kranemora/html2text

基本用法

$html = <<<EOF
<p>Welcome to <strong>html2text<strong></p>
<p>The <em>best<em> html to text converter!</p>
EOF;

$html2Text = new \kranemora\Html2Text\Html2Text;
$text = $html2Text->convert($html);

输出

Welcome to html2text

The best html to text converter!

示例

默认设置

$html = <<<EOF
<!DOCTYPE html>
<html lang="es">
    <head>
        <title>Test Html2Text</title>
    </head>
    <body>
        <header>
            <h1>Test Document</h1>
        </header>
        <main>
            <div>
                <div>
                    <h2>Lorem ipsum</h2>
                    <p><strong>Lorem ipsum</strong> dolor sit <em>amet</em>, consectetur adipiscing elit.
                    Curabitur porttitor nisi nec finibus bibendum. Donec at elementum leo. Donec eu felis
                    vehicula, efficitur est at, fringilla nisi. Donec congue tortor vel pulvinar mattis.
                    Etiam id ornare magna. In dapibus et nisl eget convallis. Etiam eu feugiat ante.
                    Phasellus vulputate nec velit nec sagittis. Vestibulum ante ipsum primis in faucibus orci
                    luctus et ultrices posuere cubilia Curae; Ut gravida accumsan lorem, id viverra nunc
                    ultrices quis. Duis in tristique ligula, vel semper urna.</p>
                    <dl>
                        <dt>Dolor sit amet</dt>
                        <dd>consectetur adipiscing elit.</dd>
                        <dt>Curabitur porttitor nisi nec finibus bibendum</dt>
                        <dd>Donec at elementum leo.</dd>
                        <dt>Donec eu felis vehicula</dt>
                        <dd>Efficitur est at.</dd>
                    </dl>
                </div>
                <table>
                    <tr>
                        <th rowspan="2">Position</th>
                        <th colspan="2">Gender</th>
                        <th rowspan="2">Total</th>
                    </tr>
                    <tr>
                        <th>Male</th>
                        <th>Female</th>
                    </tr>
                    <tr>
                        <th>Tutor</th>
                        <td>5</td>
                        <td>8</td>
                        <td>13</td>
                    </tr>
                    <tr>
                        <th>Professor</th>
                        <td>10</td>
                        <td>8</td>
                        <td>18</td>
                    </tr>
                </table>
                <h2>Aenean a massa convallis</h2>
                <ul>
                    <li class="item">Ultrices magna vitae</li>
                    <li class="item">Gravida velit</li>
                    <li class="item">Nunc lobortis</li>
                    <li class="item">Tortor nec auctor ultricies</li>
                </ul>
                <h3>Curabitur bibendum eu diam et venenatis</h3>
                <ol>
                    <li>Donec vitae enim suscipit</li>
                    <li>Porta nunc tincidunt</li>
                    <li>Consequat leo</li>
                    <li>Nunc eu risus rutrum</li>
                </ol>
            </div>
        </main>
        <footer>
            <aside>
                <h2>Lorem ipsum</h2>
                <ul>
                    <li><a href="https://#">Facebook</a></li>
                    <li><a href="https://www.twitter.com">Twitter</a></li>
                    <li><a href="https://www.linkedin.com/">Linkedin</a></li>
                    <li><a href="https://www.instagram.com">Instagram</a></li>
                </ul>
            </aside>
        </footer>
        Lorem ipsum
    </body>
</html>
EOF;

$html2Text = new \kranemora\Html2Text\Html2Text;
$text = $html2Text->convert($html);

输出

Test Document
Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur porttitor nisi nec finibus bibendum. Donec at elementum leo. Donec eu felis vehicula, efficitur est at, fringilla nisi. Donec congue tortor vel pulvinar mattis. Etiam id ornare magna. In dapibus et nisl eget convallis. Etiam eu feugiat ante. Phasellus vulputate nec velit nec sagittis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Ut gravida accumsan lorem, id viverra nunc ultrices quis. Duis in tristique ligula, vel semper urna.

Dolor sit amet
consectetur adipiscing elit.

Curabitur porttitor nisi nec finibus bibendum
Donec at elementum leo.

Donec eu felis vehicula
Efficitur est at.

+-----------+---------------+-------+
| Position  | Gender        | Total |
|           |---------------|       |
|           | Male | Female |       |
+-----------+------+--------+-------+
| Tutor     |    5 |      8 |    13 |
+-----------+------+--------+-------+
| Professor |   10 |      8 |    18 |
+-----------+------+--------+-------+

Aenean a massa convallis

- Ultrices magna vitae
- Gravida velit
- Nunc lobortis
- Tortor nec auctor ultricies

Curabitur bibendum eu diam et venenatis

- Donec vitae enim suscipit
- Porta nunc tincidunt
- Consequat leo
- Nunc eu risus rutrum

Lorem ipsum

- Facebook [https://#]
- Twitter [https://www.twitter.com]
- Linkedin [https://www.linkedin.com/]
- Instagram [https://www.instagram.com]

Lorem ipsum

自定义设置

$html = <<<EOF
<ul>
    <li>Ultrices magna vitae</li>
    <li>Gravida velit</li>
    <li>Nunc lobortis</li>
    <li>Tortor nec auctor ultricies</li>
</ul>
<ul>
    <li>Tortor nec auctor ultricies</li>
    <li>Nunc lobortis</li>
    <li>Gravida velit</li>
    <li>Ultrices magna vitae</li>
</ul>
EOF;

$options = [
    // You can set only one option ...
    'ul' => [
        'break' => "\n"
    ],
    // ... or set them all
    'li' => [
        'break' => '',
        'prepend' => '[',
        'append' => ']',
        'between' => ', '
    ]
];

$html2Text = new \kranemora\Html2Text\Html2Text;
$html2Text->setDefaultOptions($options);
$text = $html2Text->convert($html);

输出

[Ultrices magna vitae], [Gravida velit], [Nunc lobortis], [Tortor nec auctor ultricies]
[Tortor nec auctor ultricies], [Nunc lobortis], [Gravida velit], [Ultrices magna vitae]

自定义解析器

./src/Parsers/OlParser.php

<?php
namespace kranemora\Html2Text\Parsers;

use DOMElement;

class OlParser extends BaseParser
{
    // Overwrite this function and return the node in plain text
    public function getText(DOMElement $node)
    {
        $options = $this->getOptions(); // Gets the options that were set with Html2Tex :: setDefaultOptions

        // Write here the algorithm to convert the node to plain text

        return "node in plain text";
    }
}

将解析器设置为HTML元素

$options = [
    'ol' => [
        'break' => "\n",
        'parser' => [
            'class' => '\kranemora\Html2Text\Parsers\OlParser',
            'options' => [
                'reverse' => 0
            ]
        ]
    ]
];

作者

许可协议

本项目采用MIT许可协议