initial commit

This commit is contained in:
klein panic
2024-09-29 01:45:31 -04:00
commit 242841c44b
8018 changed files with 1426958 additions and 0 deletions

View File

@@ -0,0 +1,112 @@
Metadata-Version: 2.1
Name: html2text
Version: 2024.2.26
Summary: Turn HTML into equivalent Markdown-structured text.
Home-page: https://github.com/Alir3z4/html2text/
Author: Aaron Swartz
Author-email: me@aaronsw.com
Maintainer: Alireza Savand
Maintainer-email: alireza.savand@gmail.com
License: GNU GPL 3
Platform: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: COPYING
License-File: AUTHORS.rst
# html2text
[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text [filename [encoding]]`
| Option | Description
|--------------------------------------------------------|---------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
| `--reference-links` | Use reference links instead of links to create markdown
| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]
For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
Or you can use it from within `Python`:
```
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
```
Or with some configuration options:
```
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!
```
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
## How to install
`html2text` is available on pypi
https://pypi.org/project/html2text/
```
$ pip install html2text
```
## How to run unit tests
tox
To see the coverage results:
coverage html
then open the `./htmlcov/index.html` file in your browser.
## Documentation
Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)

View File

@@ -0,0 +1,173 @@
AUTHORS.rst
COPYING
ChangeLog.rst
MANIFEST.in
README.md
setup.cfg
setup.py
tox.ini
html2text/__init__.py
html2text/__main__.py
html2text/_typing.py
html2text/cli.py
html2text/config.py
html2text/elements.py
html2text/py.typed
html2text/utils.py
html2text.egg-info/PKG-INFO
html2text.egg-info/SOURCES.txt
html2text.egg-info/dependency_links.txt
html2text.egg-info/entry_points.txt
html2text.egg-info/not-zip-safe
html2text.egg-info/top_level.txt
test/GoogleDocMassDownload.html
test/GoogleDocMassDownload.md
test/GoogleDocSaved.html
test/GoogleDocSaved.md
test/GoogleDocSaved_two.html
test/GoogleDocSaved_two.md
test/__init__.py
test/abbr_tag.html
test/abbr_tag.md
test/anchors.html
test/anchors.md
test/apos_element.html
test/apos_element.md
test/blockquote_example.html
test/blockquote_example.md
test/bodywidth_newline.html
test/bodywidth_newline.md
test/bold_inside_link.html
test/bold_inside_link.md
test/bold_long_line.html
test/bold_long_line.md
test/br_inside_a.md
test/break_preserved_in_blockquote.html
test/break_preserved_in_blockquote.md
test/css_import_no_semicolon.html
test/css_import_no_semicolon.md
test/decript_tage.html
test/decript_tage.md
test/default_image_alt.html
test/default_image_alt.md
test/doc_with_table.html
test/doc_with_table.md
test/doc_with_table_bypass.html
test/doc_with_table_bypass.md
test/emdash-para.html
test/emdash-para.md
test/emphasis_preserved_whitespace.html
test/emphasis_preserved_whitespace.md
test/emphasis_whitespace.html
test/emphasis_whitespace.md
test/empty-img-src.html
test/empty-img-src.md
test/empty-link.html
test/empty-link.md
test/empty-title-tag.html
test/empty-title-tag.md
test/flip_emphasis.html
test/flip_emphasis.md
test/google-like_font-properties.html
test/google-like_font-properties.md
test/header_tags.html
test/header_tags.md
test/horizontal_rule.html
test/horizontal_rule.md
test/html-escaping.html
test/html-escaping.md
test/html_entities_out_of_text.html
test/html_entities_out_of_text.md
test/images_as_html.html
test/images_as_html.md
test/images_to_alt.html
test/images_to_alt.md
test/images_with_div_wrap.html
test/images_with_div_wrap.md
test/images_with_size.html
test/images_with_size.md
test/img-tag-with-link.html
test/img-tag-with-link.md
test/inplace_baseurl_substitution.html
test/inplace_baseurl_substitution.md
test/invalid_start.html
test/invalid_start.md
test/invalid_unicode.html
test/invalid_unicode.md
test/kbd_tag.html
test/kbd_tag.md
test/link_titles.html
test/link_titles.md
test/list_tags_example.html
test/list_tags_example.md
test/long_lines.html
test/long_lines.md
test/lrm_after_b.html
test/lrm_after_b.md
test/lrm_after_i.html
test/lrm_after_i.md
test/lrm_inside_i.html
test/lrm_inside_i.md
test/mark_code.html
test/mark_code.md
test/mixed_nested_lists.html
test/mixed_nested_lists.md
test/nbsp.html
test/nbsp.md
test/nbsp_unicode.html
test/nbsp_unicode.md
test/no_inline_links_example.html
test/no_inline_links_example.md
test/no_inline_links_images_to_alt.html
test/no_inline_links_images_to_alt.md
test/no_inline_links_nested.html
test/no_inline_links_nested.md
test/no_mailto_links.html
test/no_mailto_links.md
test/no_p_in_table.html
test/no_p_in_table.md
test/no_wrap_links.html
test/no_wrap_links.md
test/no_wrap_links_no_inline_links.html
test/no_wrap_links_no_inline_links.md
test/normal.html
test/normal.md
test/normal_escape_snob.html
test/normal_escape_snob.md
test/pad_table.html
test/pad_table.md
test/pad_table_empty.html
test/pad_table_empty.md
test/pad_table_no_closed_tr.html
test/pad_table_no_closed_tr.md
test/pre.html
test/pre.md
test/preformatted_in_list.html
test/preformatted_in_list.md
test/protect_links.html
test/protect_links.md
test/q_tag.html
test/q_tag.md
test/rlm_inside_strong.html
test/rlm_inside_strong.md
test/single_line_break.html
test/single_line_break.md
test/stressed_with_html_entities.html
test/stressed_with_html_entities.md
test/sub_tag.html
test/sub_tag.md
test/sup_tag.html
test/sup_tag.md
test/table_ignore.html
test/table_ignore.md
test/test_html2text.py
test/test_memleak.py
test/test_newlines_on_multiple_calls.py
test/text_after_list.html
test/text_after_list.md
test/url-escaping.html
test/url-escaping.md
test/wrap_list_items_example.html
test/wrap_list_items_example.md
test/wrap_tables.html
test/wrap_tables.md

View File

@@ -0,0 +1,2 @@
[console_scripts]
html2text = html2text.cli:main

View File

@@ -0,0 +1,22 @@
../../../../bin/html2text
../html2text/__init__.py
../html2text/__main__.py
../html2text/__pycache__/__init__.cpython-311.pyc
../html2text/__pycache__/__main__.cpython-311.pyc
../html2text/__pycache__/_typing.cpython-311.pyc
../html2text/__pycache__/cli.cpython-311.pyc
../html2text/__pycache__/config.cpython-311.pyc
../html2text/__pycache__/elements.cpython-311.pyc
../html2text/__pycache__/utils.cpython-311.pyc
../html2text/_typing.py
../html2text/cli.py
../html2text/config.py
../html2text/elements.py
../html2text/py.typed
../html2text/utils.py
PKG-INFO
SOURCES.txt
dependency_links.txt
entry_points.txt
not-zip-safe
top_level.txt