Skip to main content

Document Processing

The Nanonets Node.js SDK provides functionality for processing documents. This guide covers all available methods for document processing, fully aligned with the Nanonets API.

Setup

Minimum Node.js version required: 14.0.0

Install the Nanonets Node.js SDK using npm:

npm install nanonets

Document Schema

The document object returned by the API contains the following fields:

{
"document_id": "string",
"status": "string", // "success", "pending", "failure"
"uploaded_at": "string",
"metadata": "string | object",
"original_document_name": "string",
"raw_document_url": "string",
"verification_status": "string", // "success", "failed"
"verification_stage": "string",
"verification_message": "string",
"assigned_reviewers": ["string"],
"pages": [
{
"page_id": "string",
"page_number": 1,
"image_url": "string",
"data": {
"fields": {
"invoice_number": [
{
"field_data_id": "string",
"value": "string",
"confidence": 0.98,
"bbox": [100, 200, 300, 250],
"verification_status": "string",
"verification_message": "string",
"is_moderated": false
}
]
},
"tables": [
{
"table_id": "string",
"bbox": [100, 300, 800, 600],
"cells": [
{
"cell_id": "string",
"row": 0,
"col": 0,
"header": "item_description",
"text": "string",
"bbox": [100, 330, 300, 360],
"verification_status": "string",
"verification_message": "string",
"is_moderated": false
}
]
}
]
}
}
]
}

Upload Document

Uploads a document for processing. Supports both file and URL upload, with async and metadata options.

// Upload from file path
const result = await client.documents.upload("workflow_123", {
file: "/path/to/document.pdf",
async: false,
metadata: {
customer_id: "12345",
document_type: "invoice",
department: "finance"
}
});

// Upload from URL
const result2 = await client.documents.upload("workflow_123", {
url: "https://example.com/invoice.pdf",
async: false,
metadata: {
customer_id: "12345",
document_type: "invoice",
department: "finance"
}
});

Get Document

Retrieves the processing results for a specific document.

const document = await client.documents.get("workflow_123", "document_123");

List Documents

Retrieves a list of all documents in a specific workflow.

const documents = await client.documents.list("workflow_123", { page: 1, limit: 10 });

Delete Document

Removes a document from the workflow.

await client.documents.delete("workflow_123", "document_123");

Get Document Fields

Retrieves the extracted fields from a document.

const fields = await client.documents.getFields("workflow_123", "document_123");

Get Document Tables

Retrieves the extracted tables from a document.

const tables = await client.documents.getTables("workflow_123", "document_123");

Get Document Original File

Downloads the original document file.

const file = await client.documents.getOriginalFile("workflow_123", "document_123");

Error Handling & Common Scenarios

API error codes:

  • 200 OK: Request successful
  • 201 Created: Document uploaded successfully
  • 400 Bad Request: Invalid request parameters or unsupported file type
  • 401 Unauthorized: Invalid/missing API key
  • 404 Not Found: Workflow or document not found
  • 413 Payload Too Large: File size exceeds limit
  • 500 Internal Server Error: Server-side error

Common error scenarios:

  • File upload issues (unsupported type, too large, corrupted)
  • Processing errors (timeout, unreadable content, failure)
  • Field/table header issues (invalid/duplicate names)
const { NanonetsError, AuthenticationError, ValidationError } = require('nanonets');

try {
const result = await client.documents.upload("workflow_123", {...});
} catch (error) {
if (error instanceof AuthenticationError) {
console.error("Authentication failed:", error.message);
} else if (error instanceof ValidationError) {
console.error("Invalid input:", error.message);
} else if (error instanceof NanonetsError) {
console.error("An error occurred:", error.message);
}
}

Best Practices

  • Use async for large files or batch processing
  • Include relevant metadata for better tracking
  • Validate file types before upload
  • Check confidence scores before using extracted data
  • Handle both sync and async responses appropriately
  • Implement retry logic for failed processing
  • Delete processed documents when no longer needed
  • Monitor storage usage and implement retention policies

For more detailed information about specific features, see: