initial commit
This commit is contained in:
@@ -0,0 +1,112 @@
|
||||
Metadata-Version: 2.1
|
||||
Name: html2text
|
||||
Version: 2024.2.26
|
||||
Summary: Turn HTML into equivalent Markdown-structured text.
|
||||
Home-page: https://github.com/Alir3z4/html2text/
|
||||
Author: Aaron Swartz
|
||||
Author-email: me@aaronsw.com
|
||||
Maintainer: Alireza Savand
|
||||
Maintainer-email: alireza.savand@gmail.com
|
||||
License: GNU GPL 3
|
||||
Platform: OS Independent
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Programming Language :: Python :: 3.8
|
||||
Classifier: Programming Language :: Python :: 3.9
|
||||
Classifier: Programming Language :: Python :: 3.10
|
||||
Classifier: Programming Language :: Python :: 3.11
|
||||
Classifier: Programming Language :: Python :: 3.12
|
||||
Classifier: Programming Language :: Python :: 3 :: Only
|
||||
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||
Requires-Python: >=3.8
|
||||
Description-Content-Type: text/markdown
|
||||
License-File: COPYING
|
||||
License-File: AUTHORS.rst
|
||||
|
||||
# html2text
|
||||
|
||||
[](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
|
||||
[](https://codecov.io/gh/Alir3z4/html2text)
|
||||
|
||||
|
||||
|
||||
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
|
||||
|
||||
|
||||
Usage: `html2text [filename [encoding]]`
|
||||
|
||||
| Option | Description
|
||||
|--------------------------------------------------------|---------------------------------------------------
|
||||
| `--version` | Show program's version number and exit
|
||||
| `-h`, `--help` | Show this help message and exit
|
||||
| `--ignore-links` | Don't include any formatting for links
|
||||
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
|
||||
| `--reference-links` | Use reference links instead of links to create markdown
|
||||
| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]
|
||||
|
||||
For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
|
||||
|
||||
|
||||
Or you can use it from within `Python`:
|
||||
|
||||
```
|
||||
>>> import html2text
|
||||
>>>
|
||||
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
|
||||
**Zed's** dead baby, _Zed's_ dead.
|
||||
|
||||
```
|
||||
|
||||
|
||||
Or with some configuration options:
|
||||
```
|
||||
>>> import html2text
|
||||
>>>
|
||||
>>> h = html2text.HTML2Text()
|
||||
>>> # Ignore converting links from HTML
|
||||
>>> h.ignore_links = True
|
||||
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
|
||||
Hello, world!
|
||||
|
||||
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
|
||||
|
||||
Hello, world!
|
||||
|
||||
>>> # Don't Ignore links anymore, I like links
|
||||
>>> h.ignore_links = False
|
||||
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
|
||||
Hello, [world](https://www.google.com/earth/)!
|
||||
|
||||
```
|
||||
|
||||
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
|
||||
|
||||
|
||||
## How to install
|
||||
|
||||
`html2text` is available on pypi
|
||||
https://pypi.org/project/html2text/
|
||||
|
||||
```
|
||||
$ pip install html2text
|
||||
```
|
||||
|
||||
|
||||
## How to run unit tests
|
||||
|
||||
tox
|
||||
|
||||
To see the coverage results:
|
||||
|
||||
coverage html
|
||||
|
||||
then open the `./htmlcov/index.html` file in your browser.
|
||||
|
||||
## Documentation
|
||||
|
||||
Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
|
||||
@@ -0,0 +1,173 @@
|
||||
AUTHORS.rst
|
||||
COPYING
|
||||
ChangeLog.rst
|
||||
MANIFEST.in
|
||||
README.md
|
||||
setup.cfg
|
||||
setup.py
|
||||
tox.ini
|
||||
html2text/__init__.py
|
||||
html2text/__main__.py
|
||||
html2text/_typing.py
|
||||
html2text/cli.py
|
||||
html2text/config.py
|
||||
html2text/elements.py
|
||||
html2text/py.typed
|
||||
html2text/utils.py
|
||||
html2text.egg-info/PKG-INFO
|
||||
html2text.egg-info/SOURCES.txt
|
||||
html2text.egg-info/dependency_links.txt
|
||||
html2text.egg-info/entry_points.txt
|
||||
html2text.egg-info/not-zip-safe
|
||||
html2text.egg-info/top_level.txt
|
||||
test/GoogleDocMassDownload.html
|
||||
test/GoogleDocMassDownload.md
|
||||
test/GoogleDocSaved.html
|
||||
test/GoogleDocSaved.md
|
||||
test/GoogleDocSaved_two.html
|
||||
test/GoogleDocSaved_two.md
|
||||
test/__init__.py
|
||||
test/abbr_tag.html
|
||||
test/abbr_tag.md
|
||||
test/anchors.html
|
||||
test/anchors.md
|
||||
test/apos_element.html
|
||||
test/apos_element.md
|
||||
test/blockquote_example.html
|
||||
test/blockquote_example.md
|
||||
test/bodywidth_newline.html
|
||||
test/bodywidth_newline.md
|
||||
test/bold_inside_link.html
|
||||
test/bold_inside_link.md
|
||||
test/bold_long_line.html
|
||||
test/bold_long_line.md
|
||||
test/br_inside_a.md
|
||||
test/break_preserved_in_blockquote.html
|
||||
test/break_preserved_in_blockquote.md
|
||||
test/css_import_no_semicolon.html
|
||||
test/css_import_no_semicolon.md
|
||||
test/decript_tage.html
|
||||
test/decript_tage.md
|
||||
test/default_image_alt.html
|
||||
test/default_image_alt.md
|
||||
test/doc_with_table.html
|
||||
test/doc_with_table.md
|
||||
test/doc_with_table_bypass.html
|
||||
test/doc_with_table_bypass.md
|
||||
test/emdash-para.html
|
||||
test/emdash-para.md
|
||||
test/emphasis_preserved_whitespace.html
|
||||
test/emphasis_preserved_whitespace.md
|
||||
test/emphasis_whitespace.html
|
||||
test/emphasis_whitespace.md
|
||||
test/empty-img-src.html
|
||||
test/empty-img-src.md
|
||||
test/empty-link.html
|
||||
test/empty-link.md
|
||||
test/empty-title-tag.html
|
||||
test/empty-title-tag.md
|
||||
test/flip_emphasis.html
|
||||
test/flip_emphasis.md
|
||||
test/google-like_font-properties.html
|
||||
test/google-like_font-properties.md
|
||||
test/header_tags.html
|
||||
test/header_tags.md
|
||||
test/horizontal_rule.html
|
||||
test/horizontal_rule.md
|
||||
test/html-escaping.html
|
||||
test/html-escaping.md
|
||||
test/html_entities_out_of_text.html
|
||||
test/html_entities_out_of_text.md
|
||||
test/images_as_html.html
|
||||
test/images_as_html.md
|
||||
test/images_to_alt.html
|
||||
test/images_to_alt.md
|
||||
test/images_with_div_wrap.html
|
||||
test/images_with_div_wrap.md
|
||||
test/images_with_size.html
|
||||
test/images_with_size.md
|
||||
test/img-tag-with-link.html
|
||||
test/img-tag-with-link.md
|
||||
test/inplace_baseurl_substitution.html
|
||||
test/inplace_baseurl_substitution.md
|
||||
test/invalid_start.html
|
||||
test/invalid_start.md
|
||||
test/invalid_unicode.html
|
||||
test/invalid_unicode.md
|
||||
test/kbd_tag.html
|
||||
test/kbd_tag.md
|
||||
test/link_titles.html
|
||||
test/link_titles.md
|
||||
test/list_tags_example.html
|
||||
test/list_tags_example.md
|
||||
test/long_lines.html
|
||||
test/long_lines.md
|
||||
test/lrm_after_b.html
|
||||
test/lrm_after_b.md
|
||||
test/lrm_after_i.html
|
||||
test/lrm_after_i.md
|
||||
test/lrm_inside_i.html
|
||||
test/lrm_inside_i.md
|
||||
test/mark_code.html
|
||||
test/mark_code.md
|
||||
test/mixed_nested_lists.html
|
||||
test/mixed_nested_lists.md
|
||||
test/nbsp.html
|
||||
test/nbsp.md
|
||||
test/nbsp_unicode.html
|
||||
test/nbsp_unicode.md
|
||||
test/no_inline_links_example.html
|
||||
test/no_inline_links_example.md
|
||||
test/no_inline_links_images_to_alt.html
|
||||
test/no_inline_links_images_to_alt.md
|
||||
test/no_inline_links_nested.html
|
||||
test/no_inline_links_nested.md
|
||||
test/no_mailto_links.html
|
||||
test/no_mailto_links.md
|
||||
test/no_p_in_table.html
|
||||
test/no_p_in_table.md
|
||||
test/no_wrap_links.html
|
||||
test/no_wrap_links.md
|
||||
test/no_wrap_links_no_inline_links.html
|
||||
test/no_wrap_links_no_inline_links.md
|
||||
test/normal.html
|
||||
test/normal.md
|
||||
test/normal_escape_snob.html
|
||||
test/normal_escape_snob.md
|
||||
test/pad_table.html
|
||||
test/pad_table.md
|
||||
test/pad_table_empty.html
|
||||
test/pad_table_empty.md
|
||||
test/pad_table_no_closed_tr.html
|
||||
test/pad_table_no_closed_tr.md
|
||||
test/pre.html
|
||||
test/pre.md
|
||||
test/preformatted_in_list.html
|
||||
test/preformatted_in_list.md
|
||||
test/protect_links.html
|
||||
test/protect_links.md
|
||||
test/q_tag.html
|
||||
test/q_tag.md
|
||||
test/rlm_inside_strong.html
|
||||
test/rlm_inside_strong.md
|
||||
test/single_line_break.html
|
||||
test/single_line_break.md
|
||||
test/stressed_with_html_entities.html
|
||||
test/stressed_with_html_entities.md
|
||||
test/sub_tag.html
|
||||
test/sub_tag.md
|
||||
test/sup_tag.html
|
||||
test/sup_tag.md
|
||||
test/table_ignore.html
|
||||
test/table_ignore.md
|
||||
test/test_html2text.py
|
||||
test/test_memleak.py
|
||||
test/test_newlines_on_multiple_calls.py
|
||||
test/text_after_list.html
|
||||
test/text_after_list.md
|
||||
test/url-escaping.html
|
||||
test/url-escaping.md
|
||||
test/wrap_list_items_example.html
|
||||
test/wrap_list_items_example.md
|
||||
test/wrap_tables.html
|
||||
test/wrap_tables.md
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
[console_scripts]
|
||||
html2text = html2text.cli:main
|
||||
@@ -0,0 +1,22 @@
|
||||
../../../../bin/html2text
|
||||
../html2text/__init__.py
|
||||
../html2text/__main__.py
|
||||
../html2text/__pycache__/__init__.cpython-311.pyc
|
||||
../html2text/__pycache__/__main__.cpython-311.pyc
|
||||
../html2text/__pycache__/_typing.cpython-311.pyc
|
||||
../html2text/__pycache__/cli.cpython-311.pyc
|
||||
../html2text/__pycache__/config.cpython-311.pyc
|
||||
../html2text/__pycache__/elements.cpython-311.pyc
|
||||
../html2text/__pycache__/utils.cpython-311.pyc
|
||||
../html2text/_typing.py
|
||||
../html2text/cli.py
|
||||
../html2text/config.py
|
||||
../html2text/elements.py
|
||||
../html2text/py.typed
|
||||
../html2text/utils.py
|
||||
PKG-INFO
|
||||
SOURCES.txt
|
||||
dependency_links.txt
|
||||
entry_points.txt
|
||||
not-zip-safe
|
||||
top_level.txt
|
||||
@@ -0,0 +1 @@
|
||||
|
||||
@@ -0,0 +1 @@
|
||||
html2text
|
||||
Reference in New Issue
Block a user