crwlr/html-2-text

将HTML转换为格式化纯文本。

v0.1.1 2024-02-21 22:39 UTC

This package is auto-updated.

Last update: 2024-09-21 23:55:52 UTC


README

crwlr.software logo

HTML到格式化纯文本

此易于使用的包可以帮助您将HTML转换为格式化纯文本。

演示

use Crwlr\Html2Text\Html2Text;

$html = <<<HTML
<!DOCTYPE html>
<html lang="en">
<head><title>Example Website Title</title></head>
<body>
    <script>console.log('test');</script>
    <style>#app { background-color: #fff; }</style>
    <article>
        <h1>Article Headline</h1>
        <h2>A Subheading</h2>

        <p>Some text containing <a href="https://www.crwl.io">a link</a>.</p>

        <ul>
            <li>list item</li>
            <li>another list item</li>
            <li>and one more
                <ul>
                    <li>second level
                        <ul>
                            <li>third level</li>
                        </ul>
                    </li>
                </ul>
            </li>
        </ul>

        <table>
            <thead>
            <tr><th>column 1</th><th>column 2</th><th>column 3</th></tr>
            </thead>
            <tbody>
            <tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>
            <tr><td>value 1</td><td colspan="2">value 2 + 3</td></tr>
            <tr><td colspan="2">value 1 and 2</td><td>value 3</td></tr>
            <tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>
            </tbody>
        </table>
    </article>
</body>
</html>
HTML;

$text = Html2Text::convert($html);

结果文本

# Article Headline

## A Subheading

Some text containing [a link](https://www.crwl.io).

* list item
* another list item
* and one more
  * second level
    * third level

| column 1 | column 2 | column 3 |
| -------- | -------- | -------- |
| value 1  | value 2  | value 3  |
| value 1  | value 2 + 3         |
| value 1 and 2       | value 3  |
| value 1  | value 2  | value 3  |

文档

您可以在crwlr.software找到完整的文档。

贡献

如果您考虑为此包做出贡献,请阅读贡献指南(CONTRIBUTING.md)