Document Processing

This section covers all APIs related to document processing, including uploading documents, retrieving results, and managing processed documents.

Document Schema

Before diving into the APIs, here's the common schema for a document and its processing results:

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "string", // "success", "pending", "failure"
  "uploaded_at": "string", // ISO 8601 timestamp
  "metadata": "string | object", // Optional metadata attached during upload
  "original_document_name": "string", // Original filename or URL of the document
  "raw_document_url": "string", // URL to access the original document
  "verification_status": "string", // "success", "failed"
  "verification_stage": "string", // stage_id where document is flagged for verification
  "verification_message": "string", // Optional message explaining verification failure
  "assigned_reviewers": ["string"], // List of email addresses of assigned reviewers
  "pages": [
    {
      "page_id": "550e8400-e29b-41d4-a716-446655440001",
      "page_number": 1,
      "image_url": "string", // URL to access the page image
      "data": {
        "fields": {
          "invoice_number": [  // Field names must be alphanumeric with underscores only
            {
              "field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5", // Unique identifier for the field value
              "value": "string",
              "confidence": 0.98,
              "bbox": [100, 200, 300, 250], // [x1, y1, x2, y2]
              "verification_status": "string", // "success", "failed"
              "verification_message": "string", // Optional message explaining verification failure
              "is_moderated": false // Indicates if the field value was manually corrected by a reviewer
            }
          ]
        },
        "tables": [
          {
            "table_id": "string",
            "bbox": [100, 300, 800, 600],
            "cells": [
              {
                "cell_id": "string",
                "row": 0,
                "col": 0,
                "header": "item_description",  // Table header names must be alphanumeric with underscores only
                "text": "string",
                "bbox": [100, 330, 300, 360],
                "verification_status": "string", // "success", "failed"
                "verification_message": "string", // Optional message explaining verification failure
                "is_moderated": false // Indicates if the cell value was manually corrected by a reviewer
              }
            ]
          }
        ]
      }
    }
  ]
}

Schema Components

Document Information
- document_id: Unique identifier for the document
- status: Current processing status
- uploaded_at: Timestamp of document upload
- metadata: Optional data attached during upload
- original_document_name: Original filename or URL of the document
- raw_document_url: URL to access the original document
- verification_status: Overall document verification status
- verification_stage: stage_id where document is flagged for verification
- verification_message: Optional message explaining verification failure
- assigned_reviewers: List of email addresses of assigned reviewers
Page Information
- page_id: Unique identifier for the page
- page_number: Sequential page number
- image_url: URL to access the page image
Extracted Data
- fields: Key-value pairs of extracted information
  - Field names must be alphanumeric with underscores only
  - Field names must be unique within a workflow
  - Each field can have multiple values with confidence scores
  - Each field value has a unique field_data_id for tracking moderation
  - Includes bounding box coordinates for each value
  - Includes verification status and moderation flag
  - is_moderated: Indicates if the field value was manually corrected by a reviewer
  - Bounding box format: [x1, y1, x2, y2] where:
    - x1, y1: Top-left corner coordinates
    - x2, y2: Bottom-right corner coordinates
    - Coordinates are in pixels from the top-left of the page
    - Example: [100, 200, 300, 250] represents a box starting at (100,200) and ending at (300,250)
- tables: Extracted tabular data
  - Each table has a unique ID and bounding box
  - Table header names must be alphanumeric with underscores only
  - Table header names must be unique within a workflow
  - Cells include row/column position and header information
  - Each cell has a unique cell_id for tracking moderation
  - Each cell has its own bounding box coordinates
  - Includes verification status and moderation flag
  - is_moderated: Indicates if the cell value was manually corrected by a reviewer
  - Bounding boxes follow the same format as field values

Upload Document for Processing

Upload a document for processing. The response structure depends on the workflow's processing_type setting and whether the upload is synchronous or asynchronous.

POST /api/v4/workflows/{workflow_id}/documents

Overview

Upload documents for processing in a workflow
Supports both file upload and URL-based processing
Can process documents synchronously or asynchronously
Returns immediate results for sync processing
Returns document ID for async processing
Supports metadata attachment for better document tracking
Results include extracted fields, tables, and page information

Supported File Types

Images: JPG, JPEG, PNG, TIFF
Documents: PDF
Spreadsheets: XLS, XLSX
Maximum file size: 20MB

Processing Limits

Rate Limits

Maximum processing rate: 75 pages per minute
Applies to both synchronous and asynchronous processing
Exceeding the rate limit will result in a 429 (Too Many Requests) response

Document Size Limits

Maximum pages per document: 500 pages
Documents exceeding 500 pages will be rejected with a 400 (Bad Request) response

Processing Mode Rules

Documents with 3 or fewer pages: Can be processed synchronously or asynchronously
Documents with more than 3 pages: Will be automatically converted to asynchronous processing
When converted to async, the response will include status: "processing" and a document_id

Request

The request can be sent in two ways:

Using multipart/form-data with the following fields:
- file: The document file to process (required if document_url is not provided)
- async: Boolean value indicating whether to process asynchronously (optional, default: false)
- metadata: Any string or JSON that will be attached to the document (optional)
Using application/json with the following fields:
- document_url: URL of the document to process (required if file is not provided)
- async: Boolean value indicating whether to process asynchronously (optional, default: false)
- metadata: Any string or JSON that will be attached to the document (optional)

Example

Python
cURL
Node.js
Go

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"

# Method 1: Upload document with metadata as JSON
files = {
    'file': ('invoice.pdf', open('invoice.pdf', 'rb'), 'application/pdf')
}

data = {
    'async': 'false',
    'metadata': '{"customer_id": "12345", "document_type": "invoice", "department": "finance"}'
}

response = requests.post(
    url,
    files=files,
    data=data,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

# Method 2: Process document from URL
json_data = {
    'document_url': 'https://example.com/invoice.pdf',
    'async': 'false',
    'metadata': '{"customer_id": "12345", "document_type": "invoice", "department": "finance"}'
}

response = requests.post(
    url,
    json=json_data,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

# Method 1: Upload document with JSON metadata
curl -X POST \
  -u "YOUR_API_KEY:" \
  -F "file=@invoice.pdf" \
  -F "async=false" \
  -F 'metadata={"customer_id": "12345", "document_type": "invoice", "department": "finance"}' \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents

# Method 2: Process document from URL
curl -X POST \
  -u "YOUR_API_KEY:" \
  -H "Content-Type: application/json" \
  -d '{
    "document_url": "https://example.com/invoice.pdf",
    "async": false,
    "metadata": {"customer_id": "12345", "document_type": "invoice", "department": "finance"}
  }' \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents`;

// Method 1: Upload document with JSON metadata
const formData = new FormData();
formData.append('file', fs.createReadStream('invoice.pdf'));
formData.append('async', 'false');
formData.append('metadata', JSON.stringify({
  customer_id: '12345',
  document_type: 'invoice',
  department: 'finance'
}));

axios.post(url, formData, {
  auth: {
    username: API_KEY,
    password: ''
  },
  headers: {
    ...formData.getHeaders()
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

// Method 2: Process document from URL
const jsonData = {
  document_url: 'https://example.com/invoice.pdf',
  async: false,
  metadata: {
    customer_id: '12345',
    document_type: 'invoice',
    department: 'finance'
  }
};

axios.post(url, jsonData, {
  auth: {
    username: API_KEY,
    password: ''
  },
  headers: {
    'Content-Type': 'application/json'
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

package main

import (
    "bytes"
    "fmt"
    "io"
    "mime/multipart"
    "net/http"
    "os"
    "encoding/base64"
    "encoding/json"
)

func uploadDocument(metadata string, useURL bool) error {
    API_KEY := "YOUR_API_KEY"
    WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
    url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents", WORKFLOW_ID)

    if useURL {
        // Method 2: Process document from URL
        jsonData := map[string]interface{}{
            "document_url": "https://example.com/invoice.pdf",
            "async":       false,
            "metadata":    metadata,
        }

        jsonBytes, err := json.Marshal(jsonData)
        if err != nil {
            return err
        }

        req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonBytes))
        if err != nil {
            return err
        }

        auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
        req.Header.Add("Authorization", "Basic "+auth)
        req.Header.Set("Content-Type", "application/json")

        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            return err
        }
        defer resp.Body.Close()

        respBody, err := io.ReadAll(resp.Body)
        if err != nil {
            return err
        }
        fmt.Println(string(respBody))
        return nil
    }

    // Method 1: Upload file
    body := &bytes.Buffer{}
    writer := multipart.NewWriter(body)

    file, err := os.Open("invoice.pdf")
    if err != nil {
        return err
    }
    defer file.Close()

    part, err := writer.CreateFormFile("file", "invoice.pdf")
    if err != nil {
        return err
    }
    _, err = io.Copy(part, file)

    writer.WriteField("async", "false")
    writer.WriteField("metadata", metadata)
    writer.Close()

    req, err := http.NewRequest("POST", url, body)
    if err != nil {
        return err
    }

    auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
    req.Header.Add("Authorization", "Basic "+auth)
    req.Header.Set("Content-Type", writer.FormDataContentType())

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    respBody, err := io.ReadAll(resp.Body)
    if err != nil {
        return err
    }
    fmt.Println(string(respBody))
    return nil
}

func main() {
    metadata := `{"customer_id": "12345", "document_type": "invoice", "department": "finance"}`
    
    // Upload file
    if err := uploadDocument(metadata, false); err != nil {
        fmt.Println("Error uploading file:", err)
    }

    // Process from URL
    if err := uploadDocument(metadata, true); err != nil {
        fmt.Println("Error processing URL:", err)
    }
}

Synchronous Response (async: false)

Returns data organized by page
Each page contains its own fields and tables
Better for multi-page documents
Allows page-by-page processing and verification

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "success",
  "uploaded_at": "2024-03-14T12:00:00Z",
  "metadata": {
    "customer_id": "12345",
    "document_type": "invoice"
  },
  "original_document_name": "invoice_2024_001.pdf",
  "raw_document_url": "https://storage.nanonets.com/documents/550e8400-e29b-41d4-a716-446655440000.pdf",
  "verification_status": "success",
  "verification_stage": "stage_123",
  "verification_message": "",
  "assigned_reviewers": ["john.doe@example.com", "jane.smith@example.com"],
  "pages": [
    {
      "page_id": "550e8400-e29b-41d4-a716-446655440001",
      "page_number": 1,
      "image_url": "https://storage.nanonets.com/pages/550e8400-e29b-41d4-a716-446655440001.jpg",
      "data": {
        "fields": {
          "invoice_number": [
            {
              "field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
              "value": "INV-2024-001",
              "confidence": 0.98,
              "bbox": [100, 200, 300, 250],
              "verification_status": "success",
              "verification_message": "",
              "is_moderated": false
            }
          ]
        },
        "tables": [
          {
            "table_id": "d8e5c1d2-4e71-4d0e-babc-a845f2de4f1b",
            "bbox": [100, 300, 800, 600],
            "cells": [
              {
                "cell_id": "1b5d3df7-3df7-420a-a82b-29dbdfd3e1b1",
                "row": 0,
                "col": 0,
                "header": "item_description",
                "text": "Product A",
                "bbox": [100, 330, 300, 360],
                "verification_status": "success",
                "verification_message": "",
                "is_moderated": false
              }
            ]
          }
        ]
      }
    }
  ]
}

Asynchronous Response (async: true)

Returns immediately with document ID and status
Use the document ID to check processing status
Best for large documents or batch processing
Reduces timeout issues for complex documents

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "uploaded_at": "2024-03-14T12:00:00Z",
  "metadata": {
    "customer_id": "12345",
    "document_type": "invoice"
  },
  "original_document_name": "invoice_2024_001.pdf",
  "verification_status": "success",
  "verification_stage": "stage_123",
  "verification_message": "",
  "assigned_reviewers": ["john.doe@example.com", "jane.smith@example.com"]
}

List Documents

Retrieve a paginated list of all documents processed by a workflow. Results are always sorted by upload time in descending order (newest first).

GET /api/v4/workflows/{workflow_id}/documents

Overview

Lists all documents processed by a workflow
Supports pagination for large document sets
Includes document status and metadata
Returns basic page information
Useful for monitoring and management
Can be used to track processing status
Note: This endpoint only returns basic document information. To get detailed extracted data (fields and tables), use the Get Document by ID API

Query Parameters

page: Page number for pagination (default: 1)
limit: Number of documents per page (default: 10, max: 100)

Response

{
  "documents": [
    {
      "document_id": "db237bf3-f3c1-4441-905e-a6b7538db269",
      "status": "pending",
      "uploaded_at": "2025-05-23T09:38:46.583415Z",
      "metadata": "{\"customer_id\": \"12345\", \"document_type\": \"invoice\", \"department\": \"finance\"}",
      "original_document_name": "invoice.pdf",
      "raw_document_url": "uploadedfiles/37b9d483-d43f-4d0c-80ef-e0ac463055ba/RawPredictions/db237bf3-f3c1-4441-905e-a6b7538db269.pdf",
      "verification_status": "success",
      "verification_stage": "00000000-0000-0000-0000-000000000000",
      "assigned_reviewers": [],
      "pages": [
        {
          "page_id": "b94d9ca6-37b9-11f0-a23c-367dda7a627d",
          "page_number": 0,
          "image_url": "",
          "data": {
            "fields": null,
            "tables": null
          }
        }
      ]
    },
    {
      "document_id": "00000000-0000-0000-0000-000000000000",
      "status": "success",
      "uploaded_at": "2025-05-23T09:14:53.330939Z",
      "original_document_name": "invoice.pdf",
      "raw_document_url": "uploadedfiles/37b9d483-d43f-4d0c-80ef-e0ac463055ba/RawPredictions/00000000-0000-0000-0000-000000000000.pdf",
      "verification_status": "success",
      "verification_stage": "ffffffff-ffff-ffff-ffff-ffffffffffff",
      "assigned_reviewers": [],
      "pages": [
        {
          "page_id": "6304a3ce-37b6-11f0-a24d-367dda7a627d",
          "page_number": 0,
          "image_url": "",
          "data": {
            "fields": null,
            "tables": null
          }
        }
      ]
    }
  ],
  "total_count": 2,
  "page_no": 1,
  "page_size": 50
}

Example

Python
cURL
Node.js
Go

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"

# Get first page with default limit (10)
response = requests.get(
    url,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

# Get specific page with custom limit
params = {
    'page': 2,
    'limit': 20
}
response = requests.get(
    url,
    params=params,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

# Get first page with default limit
curl -X GET \
  -u "YOUR_API_KEY:" \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents

# Get specific page with custom limit
curl -X GET \
  -u "YOUR_API_KEY:" \
  "https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents?page=2&limit=20"

const axios = require('axios');

const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents`;

// Get first page with default limit
axios.get(url, {
  auth: {
    username: API_KEY,
    password: ''
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

// Get specific page with custom limit
axios.get(url, {
  auth: {
    username: API_KEY,
    password: ''
  },
  params: {
    page: 2,
    limit: 20
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

package main

import (
    "fmt"
    "net/http"
    "encoding/base64"
    "encoding/json"
)

func listDocuments(page, limit int) error {
    API_KEY := "YOUR_API_KEY"
    WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
    url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents", WORKFLOW_ID)

    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        return err
    }

    // Add query parameters
    q := req.URL.Query()
    q.Add("page", fmt.Sprintf("%d", page))
    q.Add("limit", fmt.Sprintf("%d", limit))
    req.URL.RawQuery = q.Encode()

    // Add auth header
    auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
    req.Header.Add("Authorization", "Basic "+auth)

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return err
    }

    fmt.Printf("%+v\n", result)
    return nil
}

func main() {
    // Get first page with default limit
    if err := listDocuments(1, 10); err != nil {
        fmt.Println("Error getting first page:", err)
    }

    // Get specific page with custom limit
    if err := listDocuments(2, 20); err != nil {
        fmt.Println("Error getting specific page:", err)
    }
}

Get Document Data

Retrieve the processing results for a specific document.

GET /api/v4/workflows/{workflow_id}/documents/{document_id}

Overview

Retrieves complete processing results for a document
Returns the same structure as synchronous upload response
Includes all extracted fields and tables
Contains page information and image URLs
Useful for retrieving results of async processing
Can be used to verify processing results

Response

Same structure as the synchronous response from the upload endpoint.

Example

Python
cURL
Node.js
Go

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}"

response = requests.get(
    url,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

curl -X GET \
  -u "YOUR_API_KEY:" \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001

const axios = require('axios');

const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}`;

axios.get(url, {
  auth: {
    username: API_KEY,
    password: ''
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

package main

import (
    "fmt"
    "net/http"
    "encoding/base64"
    "encoding/json"
)

func main() {
    API_KEY := "YOUR_API_KEY"
    WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
    DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
    url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s", 
        WORKFLOW_ID, DOCUMENT_ID)

    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        fmt.Println(err)
        return
    }

    auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
    req.Header.Add("Authorization", "Basic "+auth)

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err)
        return
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("%+v\n", result)
}

Get Page Data

Retrieve the processing results for a specific page of a document.

GET /api/v4/workflows/{workflow_id}/documents/{document_id}/pages/{page_id}

Overview

Retrieves data for a specific page
Useful for multi-page documents
Returns page-specific fields and tables
Includes page image URL and dimensions
Can be used for page-level verification
Helps in handling large documents efficiently

Response

{
  "page_id": "550e8400-e29b-41d4-a716-446655440001",
  "page_number": 1,
  "image_url": "https://storage.nanonets.com/pages/550e8400-e29b-41d4-a716-446655440001.jpg",
  "data": {
    "fields": {
      "invoice_number": [
        {
          "field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
          "value": "INV-2024-001",
          "confidence": 0.98,
          "bbox": [100, 200, 300, 250],
          "verification_status": "success",
          "verification_message": "",
          "is_moderated": false
        }
      ],
      "total_amount": [
        {
          "field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
          "value": "1500.00",
          "confidence": 0.95,
          "bbox": [400, 200, 500, 250],
          "verification_status": "success",
          "verification_message": "",
          "is_moderated": false
        }
      ]
    },
    "tables": [
      {
        "table_id": "d8e5c1d2-4e71-4d0e-babc-a845f2de4f1b",
        "bbox": [100, 300, 800, 600],
        "cells": [
          {
            "cell_id": "1b5d3df7-3df7-420a-a82b-29dbdfd3e1b1",
            "row": 0,
            "col": 0,
            "header": "item_description",
            "text": "Product A",
            "bbox": [100, 330, 300, 360],
            "verification_status": "success",
            "verification_message": "",
            "is_moderated": false
          },
          {
            "cell_id": "43bd4a61-0131-47b9-9015-4df4b62d4531",
            "row": 0,
            "col": 1,
            "header": "quantity",
            "text": "2",
            "bbox": [350, 330, 450, 360],
            "verification_status": "success",
            "verification_message": "",
            "is_moderated": false
          }
        ]
      }
    ]
  }
}

Example

Python
cURL
Node.js
Go

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
PAGE_ID = '550e8400-e29b-41d4-a716-446655440002'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}/pages/{PAGE_ID}"

response = requests.get(
    url,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

curl -X GET \
  -u "YOUR_API_KEY:" \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001/pages/550e8400-e29b-41d4-a716-446655440002

const axios = require('axios');

const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const PAGE_ID = '550e8400-e29b-41d4-a716-446655440002';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}/pages/${PAGE_ID}`;

axios.get(url, {
  auth: {
    username: API_KEY,
    password: ''
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

package main

import (
    "fmt"
    "net/http"
    "encoding/base64"
    "encoding/json"
)

func main() {
    API_KEY := "YOUR_API_KEY"
    WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
    DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
    PAGE_ID := "550e8400-e29b-41d4-a716-446655440002"
    url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s/pages/%s", 
        WORKFLOW_ID, DOCUMENT_ID, PAGE_ID)

    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        fmt.Println(err)
        return
    }

    auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
    req.Header.Add("Authorization", "Basic "+auth)

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err)
        return
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("%+v\n", result)
}

Delete Document

Delete a processed document and its associated data.

DELETE /api/v4/workflows/{workflow_id}/documents/{document_id}

Overview

Permanently removes a document and its data
Cannot be undone
Frees up storage space
Useful for data cleanup
Should be used with caution
Consider implementing a retention policy

Response

{
  "message": "Document deleted successfully"
}

Example

Python
cURL
Node.js
Go

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}"

response = requests.delete(
    url,
    auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())

curl -X DELETE \
  -u "YOUR_API_KEY:" \
  https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001

const axios = require('axios');

const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}`;

axios.delete(url, {
  auth: {
    username: API_KEY,
    password: ''
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

package main

import (
    "fmt"
    "net/http"
    "encoding/base64"
    "encoding/json"
)

func main() {
    API_KEY := "YOUR_API_KEY"
    WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
    DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
    url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s", 
        WORKFLOW_ID, DOCUMENT_ID)

    req, err := http.NewRequest("DELETE", url, nil)
    if err != nil {
        fmt.Println(err)
        return
    }

    auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
    req.Header.Add("Authorization", "Basic "+auth)

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err)
        return
    }
    defer resp.Body.Close()

    var result map[string]interface{}
    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("%+v\n", result)
}

Error Handling

All document processing APIs return standard HTTP status codes:

200 OK: Request successful
201 Created: Document uploaded successfully
400 Bad Request: Invalid request parameters or unsupported file type
401 Unauthorized: Invalid or missing API key
404 Not Found: Workflow or document not found
413 Payload Too Large: File size exceeds limit
500 Internal Server Error: Server-side error

Common Error Scenarios

File Upload Issues
- Unsupported file type
- File too large (>20MB)
- Corrupted file
Processing Errors
- Document processing timeout
- Unreadable content
- Processing failure
Field and Table Header Issues
- Invalid field or table header names (non-alphanumeric characters)
- Duplicate field names within a workflow
- Duplicate table header names within a workflow

For detailed error handling, refer to the Error Handling Guide.

Best Practices

File Upload
- Use async mode for large files or batch processing
- Include relevant metadata for better tracking
- Validate file types before upload
Result Processing
- Check confidence scores before using extracted data
- Handle both sync and async responses appropriately
- Implement retry logic for failed processing
Resource Management
- Delete processed documents when no longer needed
- Monitor storage usage
- Implement document retention policies

For more best practices, refer to the Best Practices Guide.

Document Processing

Document Schema​

Schema Components​

Upload Document for Processing​

Overview​

Supported File Types​

Processing Limits​

Rate Limits​

Document Size Limits​

Processing Mode Rules​

Request​

Example​

Synchronous Response (async: false)​

Asynchronous Response (async: true)​

List Documents​

Overview​

Query Parameters​

Response​

Example​

Get Document Data​

Overview​

Response​

Example​

Get Page Data​

Overview​

Response​

Example​

Delete Document​

Overview​

Response​

Example​

Error Handling​

Common Error Scenarios​

Best Practices​

Document Schema

Schema Components

Upload Document for Processing

Overview

Supported File Types

Processing Limits

Rate Limits

Document Size Limits

Processing Mode Rules

Request

Example

Synchronous Response (async: false)

Asynchronous Response (async: true)

List Documents

Overview

Query Parameters

Response

Example

Get Document Data

Overview

Response

Example

Get Page Data

Overview

Response

Example

Delete Document

Overview

Response

Example

Error Handling

Common Error Scenarios

Best Practices