Document Processing

The Nanonets Node.js SDK provides functionality for processing documents. This guide covers all available methods for document processing, fully aligned with the Nanonets API.

Setup

Minimum Node.js version required: 14.0.0

Install the Nanonets Node.js SDK using npm:

npm install nanonets

Document Schema

The document object returned by the API contains the following fields:

{
  "document_id": "string",
  "status": "string", // "success", "pending", "failure"
  "uploaded_at": "string",
  "metadata": "string | object",
  "original_document_name": "string",
  "raw_document_url": "string",
  "verification_status": "string", // "success", "failed"
  "verification_stage": "string",
  "verification_message": "string",
  "assigned_reviewers": ["string"],
  "pages": [
    {
      "page_id": "string",
      "page_number": 1,
      "image_url": "string",
      "data": {
        "fields": {
          "invoice_number": [
            {
              "field_data_id": "string",
              "value": "string",
              "confidence": 0.98,
              "bbox": [100, 200, 300, 250],
              "verification_status": "string",
              "verification_message": "string",
              "is_moderated": false
            }
          ]
        },
        "tables": [
          {
            "table_id": "string",
            "bbox": [100, 300, 800, 600],
            "cells": [
              {
                "cell_id": "string",
                "row": 0,
                "col": 0,
                "header": "item_description",
                "text": "string",
                "bbox": [100, 330, 300, 360],
                "verification_status": "string",
                "verification_message": "string",
                "is_moderated": false
              }
            ]
          }
        ]
      }
    }
  ]
}

Upload Document

Uploads a document for processing. Supports both file and URL upload, with async and metadata options.

// Upload from file path
const result = await client.documents.upload("workflow_123", {
  file: "/path/to/document.pdf",
  async: false,
    metadata: {
    customer_id: "12345",
    document_type: "invoice",
    department: "finance"
    }
});

// Upload from URL
const result2 = await client.documents.upload("workflow_123", {
  url: "https://example.com/invoice.pdf",
  async: false,
    metadata: {
    customer_id: "12345",
    document_type: "invoice",
    department: "finance"
    }
});

Get Document

Retrieves the processing results for a specific document.

const document = await client.documents.get("workflow_123", "document_123");

List Documents

Retrieves a list of all documents in a specific workflow.

const documents = await client.documents.list("workflow_123", { page: 1, limit: 10 });

Delete Document

Removes a document from the workflow.

await client.documents.delete("workflow_123", "document_123");

Get Document Fields

Retrieves the extracted fields from a document.

const fields = await client.documents.getFields("workflow_123", "document_123");

Get Document Tables

Retrieves the extracted tables from a document.

const tables = await client.documents.getTables("workflow_123", "document_123");

Get Document Original File

Downloads the original document file.

const file = await client.documents.getOriginalFile("workflow_123", "document_123");

Error Handling & Common Scenarios

API error codes:

200 OK: Request successful
201 Created: Document uploaded successfully
400 Bad Request: Invalid request parameters or unsupported file type
401 Unauthorized: Invalid/missing API key
404 Not Found: Workflow or document not found
413 Payload Too Large: File size exceeds limit
500 Internal Server Error: Server-side error

Common error scenarios:

File upload issues (unsupported type, too large, corrupted)
Processing errors (timeout, unreadable content, failure)
Field/table header issues (invalid/duplicate names)

const { NanonetsError, AuthenticationError, ValidationError } = require('nanonets');

try {
  const result = await client.documents.upload("workflow_123", {...});
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error("Authentication failed:", error.message);
  } else if (error instanceof ValidationError) {
    console.error("Invalid input:", error.message);
    } else if (error instanceof NanonetsError) {
    console.error("An error occurred:", error.message);
    }
}

Best Practices

Use async for large files or batch processing
Include relevant metadata for better tracking
Validate file types before upload
Check confidence scores before using extracted data
Handle both sync and async responses appropriately
Implement retry logic for failed processing
Delete processed documents when no longer needed
Monitor storage usage and implement retention policies

For more detailed information about specific features, see:

Document Processing

Setup​

Document Schema​

Upload Document​

Get Document​

List Documents​

Delete Document​

Get Document Fields​

Get Document Tables​

Get Document Original File​

Error Handling & Common Scenarios​

Best Practices​