pdfminer_six extracts the text from a page directly from the sourcecode of the PDF.
Author: tavan
Version: 0.0.1
Type: Tool plugin
pdfminer.six is a powerful PDF document parsing tool that focuses on text extraction and analysis. It can directly extract text content from PDF source code and supports obtaining the precise location, font and color information of the text.
The plugin supports the following output formats:
markdown
- Markdown format (MIME: text/markdown)
html
- HTML format (MIME: text/html)
text
- Plain text format (MIME: text/plain)
tag
- Tagged text format (MIME: text/plain)
xml
- XML format (MIME: application/xml)
output_type
in the parameterhocr
This project is licensed under the MIT License.