plugin-tavan/pdfminer_six-logo
pdfminer_six
0.0.1

pdfminer_six extracts the text from a page directly from the sourcecode of the PDF.

tavan/pdfminer_six371 installs

pdfminer_six

Author: tavan
Version: 0.0.1
Type: Tool plugin

Introduction

pdfminer.six is ​​a powerful PDF document parsing tool that focuses on text extraction and analysis. It can directly extract text content from PDF source code and supports obtaining the precise location, font and color information of the text.

Main features

  • Support PDF-1.7 specification
  • Support CJK languages ​​and vertical writing scripts
  • Support multiple font types (Type1, TrueType, Type3, CID)
  • Support RC4 and AES encryption
  • Support form extraction
  • Support directory extraction
  • Support automatic layout analysis

Supported output formats

The plugin supports the following output formats:

  • markdown - Markdown format (MIME: text/markdown)

  • html - HTML format (MIME: text/html)

  • text - Plain text format (MIME: text/plain)

  • tag - Tagged text format (MIME: text/plain)

  • xml - XML ​​format (MIME: application/xml)

Usage Guide

  1. Upload file
  • Support single PDF file upload
  • The file must be in valid PDF format
  1. Select output format
  • Specify output_type in the parameter
  • Optional values: markdown, html, text, tag, xml
  • Text format is used by default

  1. Processing results
    The plugin will return responses in three formats:
  • Text message: Processing status description
  • Blob message: Converted content
  • JSON message: Processing result metadata

Future Enhancements

  • Support hocr

License

This project is licensed under the MIT License.


CATEGORY
Tool
VERSION
0.0.1
tavan·2025-04-10 14:05:20
REQUIREMENTS
Tool invocation
App invocation
Endpoint registration
Maximum memory
256MB
Maximum storage
1MB