Document Processing

The SDK provides functionality for processing documents. This section covers the available methods for document processing, fully aligned with the Nanonets API.

Document Schema

The document object returned by the API contains the following fields:

{
    "document_id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "string",  # "success", "pending", "failure"
    "uploaded_at": "string",  # ISO 8601 timestamp
    "metadata": "string | object",  # Optional metadata attached during upload
    "original_document_name": "string",  # Original filename or URL of the document
    "raw_document_url": "string",  # URL to access the original document
    "verification_status": "string",  # "success", "failed"
    "verification_stage": "string",  # stage_id where document is flagged for verification
    "verification_message": "string",  # Optional message explaining verification failure
    "assigned_reviewers": ["string"],  # List of email addresses of assigned reviewers
    "pages": [
        {
            "page_id": "550e8400-e29b-41d4-a716-446655440001",
            "page_number": 1,
            "image_url": "string",  # URL to access the page image
            "data": {
                "fields": {
                    "invoice_number": [
                        {
                            "field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
                            "value": "string",
                            "confidence": 0.98,
                            "bbox": [100, 200, 300, 250],
                            "verification_status": "string",
                            "verification_message": "string",
                            "is_moderated": False
                        }
                    ]
                },
                "tables": [
                    {
                        "table_id": "string",
                        "bbox": [100, 300, 800, 600],
                        "cells": [
                            {
                                "cell_id": "string",
                                "row": 0,
                                "col": 0,
                                "header": "item_description",
                                "text": "string",
                                "bbox": [100, 330, 300, 360],
                                "verification_status": "string",
                                "verification_message": "string",
                                "is_moderated": False
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

Upload Document

Uploads a document for processing in a specific workflow. Supports both file and URL upload, with async and metadata options.

from nanonetsclient import NanonetsClient

client = NanonetsClient(api_key='your_api_key')

# Upload document from file
result = client.workflows.upload_document(
    workflow_id="workflow_123",
    file_path="path/to/document.pdf",
    async_mode=False,  # Set to True for asynchronous processing
    metadata={
        "customer_id": "12345",
        "document_type": "invoice",
        "department": "finance"
    }
)

# Upload document from URL
result = client.workflows.upload_document(
    workflow_id="workflow_123",
    document_url="https://example.com/invoice.pdf",
    async_mode=False,
    metadata={
        "customer_id": "12345",
        "document_type": "invoice",
        "department": "finance"
    }
)

Get Document Status

Retrieves the current processing status and results of a specific document.

document = client.workflows.get_document(
    workflow_id="workflow_123",
    document_id="document_123"
)

List Documents

Retrieves a list of all documents in a specific workflow.

documents = client.workflows.list_documents(
    workflow_id="workflow_123",
    page=1,  # Page number for pagination
    limit=10  # Number of documents per page
)

Delete Document

Removes a document from the workflow.

client.workflows.delete_document(
    workflow_id="workflow_123",
    document_id="document_123"
)

Get Document Fields

Retrieves all extracted fields from a specific document.

fields = client.workflows.get_document_fields(
    workflow_id="workflow_123",
    document_id="document_123"
)

Get Document Tables

Retrieves all extracted tables from a specific document.

tables = client.workflows.get_document_tables(
    workflow_id="workflow_123",
    document_id="document_123"
)

Get Document Original File

Downloads the original document file.

original_file = client.workflows.get_document_original_file(
    workflow_id="workflow_123",
    document_id="document_123"
)

Error Handling & Common Scenarios

API error codes:

200 OK: Request successful
201 Created: Document uploaded successfully
400 Bad Request: Invalid request parameters or unsupported file type
401 Unauthorized: Invalid/missing API key
404 Not Found: Workflow or document not found
413 Payload Too Large: File size exceeds limit
500 Internal Server Error: Server-side error

Common error scenarios:

File upload issues (unsupported type, too large, corrupted)
Processing errors (timeout, unreadable content, failure)
Field/table header issues (invalid/duplicate names)

from nanonets.exceptions import NanonetsError, AuthenticationError, ValidationError

try:
    result = client.workflows.upload_document(...)
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except ValidationError as e:
    print(f"Invalid input: {e}")
except NanonetsError as e:
    print(f"An error occurred: {e}")

Best Practices

Use async_mode for large files or batch processing
Include relevant metadata for better tracking
Validate file types before upload
Check confidence scores before using extracted data
Handle both sync and async responses appropriately
Implement retry logic for failed processing
Delete processed documents when no longer needed
Monitor storage usage and implement retention policies

For more detailed information about specific features, see:

Setup

Minimum Python version required: 3.7

Install the Nanonets Python SDK using pip:

pip install nanonets

Document Processing

Document Schema​

Upload Document​

Get Document Status​

List Documents​

Delete Document​

Get Document Fields​

Get Document Tables​

Get Document Original File​

Error Handling & Common Scenarios​

Best Practices​

Setup​