Document Processing
This section covers all APIs related to document processing, including uploading documents, retrieving results, and managing processed documents.
Document Schema
Before diving into the APIs, here's the common schema for a document and its processing results:
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "string", // "success", "pending", "failure"
"uploaded_at": "string", // ISO 8601 timestamp
"metadata": "string | object", // Optional metadata attached during upload
"original_document_name": "string", // Original filename or URL of the document
"raw_document_url": "string", // URL to access the original document
"verification_status": "string", // "success", "failed"
"verification_stage": "string", // stage_id where document is flagged for verification
"verification_message": "string", // Optional message explaining verification failure
"assigned_reviewers": ["string"], // List of email addresses of assigned reviewers
"pages": [
{
"page_id": "550e8400-e29b-41d4-a716-446655440001",
"page_number": 1,
"image_url": "string", // URL to access the page image
"data": {
"fields": {
"invoice_number": [ // Field names must be alphanumeric with underscores only
{
"field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5", // Unique identifier for the field value
"value": "string",
"confidence": 0.98,
"bbox": [100, 200, 300, 250], // [x1, y1, x2, y2]
"verification_status": "string", // "success", "failed"
"verification_message": "string", // Optional message explaining verification failure
"is_moderated": false // Indicates if the field value was manually corrected by a reviewer
}
]
},
"tables": [
{
"table_id": "string",
"bbox": [100, 300, 800, 600],
"cells": [
{
"cell_id": "string",
"row": 0,
"col": 0,
"header": "item_description", // Table header names must be alphanumeric with underscores only
"text": "string",
"bbox": [100, 330, 300, 360],
"verification_status": "string", // "success", "failed"
"verification_message": "string", // Optional message explaining verification failure
"is_moderated": false // Indicates if the cell value was manually corrected by a reviewer
}
]
}
]
}
}
]
}
Schema Components
Document Information
document_id
: Unique identifier for the documentstatus
: Current processing statusuploaded_at
: Timestamp of document uploadmetadata
: Optional data attached during uploadoriginal_document_name
: Original filename or URL of the documentraw_document_url
: URL to access the original documentverification_status
: Overall document verification statusverification_stage
: stage_id where document is flagged for verificationverification_message
: Optional message explaining verification failureassigned_reviewers
: List of email addresses of assigned reviewers
Page Information
page_id
: Unique identifier for the pagepage_number
: Sequential page numberimage_url
: URL to access the page image
Extracted Data
fields
: Key-value pairs of extracted information- Field names must be alphanumeric with underscores only
- Field names must be unique within a workflow
- Each field can have multiple values with confidence scores
- Each field value has a unique
field_data_id
for tracking moderation - Includes bounding box coordinates for each value
- Includes verification status and moderation flag
is_moderated
: Indicates if the field value was manually corrected by a reviewer- Bounding box format:
[x1, y1, x2, y2]
where:x1, y1
: Top-left corner coordinatesx2, y2
: Bottom-right corner coordinates- Coordinates are in pixels from the top-left of the page
- Example:
[100, 200, 300, 250]
represents a box starting at (100,200) and ending at (300,250)
tables
: Extracted tabular data- Each table has a unique ID and bounding box
- Table header names must be alphanumeric with underscores only
- Table header names must be unique within a workflow
- Cells include row/column position and header information
- Each cell has a unique
cell_id
for tracking moderation - Each cell has its own bounding box coordinates
- Includes verification status and moderation flag
is_moderated
: Indicates if the cell value was manually corrected by a reviewer- Bounding boxes follow the same format as field values
Upload Document for Processing
Upload a document for processing. The response structure depends on the workflow's processing_type setting and whether the upload is synchronous or asynchronous.
POST /api/v4/workflows/{workflow_id}/documents
Overview
- Upload documents for processing in a workflow
- Supports both file upload and URL-based processing
- Can process documents synchronously or asynchronously
- Returns immediate results for sync processing
- Returns document ID for async processing
- Supports metadata attachment for better document tracking
- Results include extracted fields, tables, and page information
Supported File Types
- Images: JPG, JPEG, PNG, TIFF
- Documents: PDF
- Spreadsheets: XLS, XLSX
- Maximum file size: 20MB
Processing Limits
Rate Limits
- Maximum processing rate: 75 pages per minute
- Applies to both synchronous and asynchronous processing
- Exceeding the rate limit will result in a 429 (Too Many Requests) response
Document Size Limits
- Maximum pages per document: 500 pages
- Documents exceeding 500 pages will be rejected with a 400 (Bad Request) response
Processing Mode Rules
- Documents with 3 or fewer pages: Can be processed synchronously or asynchronously
- Documents with more than 3 pages: Will be automatically converted to asynchronous processing
- When converted to async, the response will include
status: "processing"
and adocument_id
Request
The request can be sent in two ways:
Using
multipart/form-data
with the following fields:file
: The document file to process (required if document_url is not provided)async
: Boolean value indicating whether to process asynchronously (optional, default: false)metadata
: Any string or JSON that will be attached to the document (optional)
Using
application/json
with the following fields:document_url
: URL of the document to process (required if file is not provided)async
: Boolean value indicating whether to process asynchronously (optional, default: false)metadata
: Any string or JSON that will be attached to the document (optional)
Example
- Python
- cURL
- Node.js
- Go
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"
# Method 1: Upload document with metadata as JSON
files = {
'file': ('invoice.pdf', open('invoice.pdf', 'rb'), 'application/pdf')
}
data = {
'async': 'false',
'metadata': '{"customer_id": "12345", "document_type": "invoice", "department": "finance"}'
}
response = requests.post(
url,
files=files,
data=data,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
# Method 2: Process document from URL
json_data = {
'document_url': 'https://example.com/invoice.pdf',
'async': 'false',
'metadata': '{"customer_id": "12345", "document_type": "invoice", "department": "finance"}'
}
response = requests.post(
url,
json=json_data,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
# Method 1: Upload document with JSON metadata
curl -X POST \
-u "YOUR_API_KEY:" \
-F "file=@invoice.pdf" \
-F "async=false" \
-F 'metadata={"customer_id": "12345", "document_type": "invoice", "department": "finance"}' \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents
# Method 2: Process document from URL
curl -X POST \
-u "YOUR_API_KEY:" \
-H "Content-Type: application/json" \
-d '{
"document_url": "https://example.com/invoice.pdf",
"async": false,
"metadata": {"customer_id": "12345", "document_type": "invoice", "department": "finance"}
}' \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents`;
// Method 1: Upload document with JSON metadata
const formData = new FormData();
formData.append('file', fs.createReadStream('invoice.pdf'));
formData.append('async', 'false');
formData.append('metadata', JSON.stringify({
customer_id: '12345',
document_type: 'invoice',
department: 'finance'
}));
axios.post(url, formData, {
auth: {
username: API_KEY,
password: ''
},
headers: {
...formData.getHeaders()
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
// Method 2: Process document from URL
const jsonData = {
document_url: 'https://example.com/invoice.pdf',
async: false,
metadata: {
customer_id: '12345',
document_type: 'invoice',
department: 'finance'
}
};
axios.post(url, jsonData, {
auth: {
username: API_KEY,
password: ''
},
headers: {
'Content-Type': 'application/json'
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
package main
import (
"bytes"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
"encoding/base64"
"encoding/json"
)
func uploadDocument(metadata string, useURL bool) error {
API_KEY := "YOUR_API_KEY"
WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents", WORKFLOW_ID)
if useURL {
// Method 2: Process document from URL
jsonData := map[string]interface{}{
"document_url": "https://example.com/invoice.pdf",
"async": false,
"metadata": metadata,
}
jsonBytes, err := json.Marshal(jsonData)
if err != nil {
return err
}
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonBytes))
if err != nil {
return err
}
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
respBody, err := io.ReadAll(resp.Body)
if err != nil {
return err
}
fmt.Println(string(respBody))
return nil
}
// Method 1: Upload file
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
file, err := os.Open("invoice.pdf")
if err != nil {
return err
}
defer file.Close()
part, err := writer.CreateFormFile("file", "invoice.pdf")
if err != nil {
return err
}
_, err = io.Copy(part, file)
writer.WriteField("async", "false")
writer.WriteField("metadata", metadata)
writer.Close()
req, err := http.NewRequest("POST", url, body)
if err != nil {
return err
}
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
req.Header.Set("Content-Type", writer.FormDataContentType())
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
respBody, err := io.ReadAll(resp.Body)
if err != nil {
return err
}
fmt.Println(string(respBody))
return nil
}
func main() {
metadata := `{"customer_id": "12345", "document_type": "invoice", "department": "finance"}`
// Upload file
if err := uploadDocument(metadata, false); err != nil {
fmt.Println("Error uploading file:", err)
}
// Process from URL
if err := uploadDocument(metadata, true); err != nil {
fmt.Println("Error processing URL:", err)
}
}
Synchronous Response (async: false)
- Returns data organized by page
- Each page contains its own fields and tables
- Better for multi-page documents
- Allows page-by-page processing and verification
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "success",
"uploaded_at": "2024-03-14T12:00:00Z",
"metadata": {
"customer_id": "12345",
"document_type": "invoice"
},
"original_document_name": "invoice_2024_001.pdf",
"raw_document_url": "https://storage.nanonets.com/documents/550e8400-e29b-41d4-a716-446655440000.pdf",
"verification_status": "success",
"verification_stage": "stage_123",
"verification_message": "",
"assigned_reviewers": ["john.doe@example.com", "jane.smith@example.com"],
"pages": [
{
"page_id": "550e8400-e29b-41d4-a716-446655440001",
"page_number": 1,
"image_url": "https://storage.nanonets.com/pages/550e8400-e29b-41d4-a716-446655440001.jpg",
"data": {
"fields": {
"invoice_number": [
{
"field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
"value": "INV-2024-001",
"confidence": 0.98,
"bbox": [100, 200, 300, 250],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
}
]
},
"tables": [
{
"table_id": "d8e5c1d2-4e71-4d0e-babc-a845f2de4f1b",
"bbox": [100, 300, 800, 600],
"cells": [
{
"cell_id": "1b5d3df7-3df7-420a-a82b-29dbdfd3e1b1",
"row": 0,
"col": 0,
"header": "item_description",
"text": "Product A",
"bbox": [100, 330, 300, 360],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
}
]
}
]
}
}
]
}
Asynchronous Response (async: true)
- Returns immediately with document ID and status
- Use the document ID to check processing status
- Best for large documents or batch processing
- Reduces timeout issues for complex documents
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"uploaded_at": "2024-03-14T12:00:00Z",
"metadata": {
"customer_id": "12345",
"document_type": "invoice"
},
"original_document_name": "invoice_2024_001.pdf",
"verification_status": "success",
"verification_stage": "stage_123",
"verification_message": "",
"assigned_reviewers": ["john.doe@example.com", "jane.smith@example.com"]
}
List Documents
Retrieve a paginated list of all documents processed by a workflow. Results are always sorted by upload time in descending order (newest first).
GET /api/v4/workflows/{workflow_id}/documents
Overview
- Lists all documents processed by a workflow
- Supports pagination for large document sets
- Includes document status and metadata
- Returns basic page information
- Useful for monitoring and management
- Can be used to track processing status
- Note: This endpoint only returns basic document information. To get detailed extracted data (fields and tables), use the Get Document by ID API
Query Parameters
page
: Page number for pagination (default: 1)limit
: Number of documents per page (default: 10, max: 100)
Response
{
"documents": [
{
"document_id": "db237bf3-f3c1-4441-905e-a6b7538db269",
"status": "pending",
"uploaded_at": "2025-05-23T09:38:46.583415Z",
"metadata": "{\"customer_id\": \"12345\", \"document_type\": \"invoice\", \"department\": \"finance\"}",
"original_document_name": "invoice.pdf",
"raw_document_url": "uploadedfiles/37b9d483-d43f-4d0c-80ef-e0ac463055ba/RawPredictions/db237bf3-f3c1-4441-905e-a6b7538db269.pdf",
"verification_status": "success",
"verification_stage": "00000000-0000-0000-0000-000000000000",
"assigned_reviewers": [],
"pages": [
{
"page_id": "b94d9ca6-37b9-11f0-a23c-367dda7a627d",
"page_number": 0,
"image_url": "",
"data": {
"fields": null,
"tables": null
}
}
]
},
{
"document_id": "00000000-0000-0000-0000-000000000000",
"status": "success",
"uploaded_at": "2025-05-23T09:14:53.330939Z",
"original_document_name": "invoice.pdf",
"raw_document_url": "uploadedfiles/37b9d483-d43f-4d0c-80ef-e0ac463055ba/RawPredictions/00000000-0000-0000-0000-000000000000.pdf",
"verification_status": "success",
"verification_stage": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"assigned_reviewers": [],
"pages": [
{
"page_id": "6304a3ce-37b6-11f0-a24d-367dda7a627d",
"page_number": 0,
"image_url": "",
"data": {
"fields": null,
"tables": null
}
}
]
}
],
"total_count": 2,
"page_no": 1,
"page_size": 50
}
Example
- Python
- cURL
- Node.js
- Go
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"
# Get first page with default limit (10)
response = requests.get(
url,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
# Get specific page with custom limit
params = {
'page': 2,
'limit': 20
}
response = requests.get(
url,
params=params,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
# Get first page with default limit
curl -X GET \
-u "YOUR_API_KEY:" \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents
# Get specific page with custom limit
curl -X GET \
-u "YOUR_API_KEY:" \
"https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents?page=2&limit=20"
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents`;
// Get first page with default limit
axios.get(url, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
// Get specific page with custom limit
axios.get(url, {
auth: {
username: API_KEY,
password: ''
},
params: {
page: 2,
limit: 20
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
package main
import (
"fmt"
"net/http"
"encoding/base64"
"encoding/json"
)
func listDocuments(page, limit int) error {
API_KEY := "YOUR_API_KEY"
WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents", WORKFLOW_ID)
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return err
}
// Add query parameters
q := req.URL.Query()
q.Add("page", fmt.Sprintf("%d", page))
q.Add("limit", fmt.Sprintf("%d", limit))
req.URL.RawQuery = q.Encode()
// Add auth header
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return err
}
fmt.Printf("%+v\n", result)
return nil
}
func main() {
// Get first page with default limit
if err := listDocuments(1, 10); err != nil {
fmt.Println("Error getting first page:", err)
}
// Get specific page with custom limit
if err := listDocuments(2, 20); err != nil {
fmt.Println("Error getting specific page:", err)
}
}
Get Document Data
Retrieve the processing results for a specific document.
GET /api/v4/workflows/{workflow_id}/documents/{document_id}
Overview
- Retrieves complete processing results for a document
- Returns the same structure as synchronous upload response
- Includes all extracted fields and tables
- Contains page information and image URLs
- Useful for retrieving results of async processing
- Can be used to verify processing results
Response
Same structure as the synchronous response from the upload endpoint.
Example
- Python
- cURL
- Node.js
- Go
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}"
response = requests.get(
url,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
curl -X GET \
-u "YOUR_API_KEY:" \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}`;
axios.get(url, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
package main
import (
"fmt"
"net/http"
"encoding/base64"
"encoding/json"
)
func main() {
API_KEY := "YOUR_API_KEY"
WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s",
WORKFLOW_ID, DOCUMENT_ID)
req, err := http.NewRequest("GET", url, nil)
if err != nil {
fmt.Println(err)
return
}
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
fmt.Println(err)
return
}
fmt.Printf("%+v\n", result)
}
Get Page Data
Retrieve the processing results for a specific page of a document.
GET /api/v4/workflows/{workflow_id}/documents/{document_id}/pages/{page_id}
Overview
- Retrieves data for a specific page
- Useful for multi-page documents
- Returns page-specific fields and tables
- Includes page image URL and dimensions
- Can be used for page-level verification
- Helps in handling large documents efficiently
Response
{
"page_id": "550e8400-e29b-41d4-a716-446655440001",
"page_number": 1,
"image_url": "https://storage.nanonets.com/pages/550e8400-e29b-41d4-a716-446655440001.jpg",
"data": {
"fields": {
"invoice_number": [
{
"field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
"value": "INV-2024-001",
"confidence": 0.98,
"bbox": [100, 200, 300, 250],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
}
],
"total_amount": [
{
"field_data_id": "f1a2b3c4-d5e6-4f7g-8h9i-j0k1l2m3n4o5",
"value": "1500.00",
"confidence": 0.95,
"bbox": [400, 200, 500, 250],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
}
]
},
"tables": [
{
"table_id": "d8e5c1d2-4e71-4d0e-babc-a845f2de4f1b",
"bbox": [100, 300, 800, 600],
"cells": [
{
"cell_id": "1b5d3df7-3df7-420a-a82b-29dbdfd3e1b1",
"row": 0,
"col": 0,
"header": "item_description",
"text": "Product A",
"bbox": [100, 330, 300, 360],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
},
{
"cell_id": "43bd4a61-0131-47b9-9015-4df4b62d4531",
"row": 0,
"col": 1,
"header": "quantity",
"text": "2",
"bbox": [350, 330, 450, 360],
"verification_status": "success",
"verification_message": "",
"is_moderated": false
}
]
}
]
}
}
Example
- Python
- cURL
- Node.js
- Go
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
PAGE_ID = '550e8400-e29b-41d4-a716-446655440002'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}/pages/{PAGE_ID}"
response = requests.get(
url,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
curl -X GET \
-u "YOUR_API_KEY:" \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001/pages/550e8400-e29b-41d4-a716-446655440002
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const PAGE_ID = '550e8400-e29b-41d4-a716-446655440002';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}/pages/${PAGE_ID}`;
axios.get(url, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
package main
import (
"fmt"
"net/http"
"encoding/base64"
"encoding/json"
)
func main() {
API_KEY := "YOUR_API_KEY"
WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
PAGE_ID := "550e8400-e29b-41d4-a716-446655440002"
url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s/pages/%s",
WORKFLOW_ID, DOCUMENT_ID, PAGE_ID)
req, err := http.NewRequest("GET", url, nil)
if err != nil {
fmt.Println(err)
return
}
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
fmt.Println(err)
return
}
fmt.Printf("%+v\n", result)
}
Delete Document
Delete a processed document and its associated data.
DELETE /api/v4/workflows/{workflow_id}/documents/{document_id}
Overview
- Permanently removes a document and its data
- Cannot be undone
- Frees up storage space
- Useful for data cleanup
- Should be used with caution
- Consider implementing a retention policy
Response
{
"message": "Document deleted successfully"
}
Example
- Python
- cURL
- Node.js
- Go
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000'
DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001'
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents/{DOCUMENT_ID}"
response = requests.delete(
url,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
curl -X DELETE \
-u "YOUR_API_KEY:" \
https://app.nanonets.com/api/v4/workflows/550e8400-e29b-41d4-a716-446655440000/documents/550e8400-e29b-41d4-a716-446655440001
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = '550e8400-e29b-41d4-a716-446655440000';
const DOCUMENT_ID = '550e8400-e29b-41d4-a716-446655440001';
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents/${DOCUMENT_ID}`;
axios.delete(url, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
package main
import (
"fmt"
"net/http"
"encoding/base64"
"encoding/json"
)
func main() {
API_KEY := "YOUR_API_KEY"
WORKFLOW_ID := "550e8400-e29b-41d4-a716-446655440000"
DOCUMENT_ID := "550e8400-e29b-41d4-a716-446655440001"
url := fmt.Sprintf("https://app.nanonets.com/api/v4/workflows/%s/documents/%s",
WORKFLOW_ID, DOCUMENT_ID)
req, err := http.NewRequest("DELETE", url, nil)
if err != nil {
fmt.Println(err)
return
}
auth := base64.StdEncoding.EncodeToString([]byte(API_KEY + ":"))
req.Header.Add("Authorization", "Basic "+auth)
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer resp.Body.Close()
var result map[string]interface{}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
fmt.Println(err)
return
}
fmt.Printf("%+v\n", result)
}
Error Handling
All document processing APIs return standard HTTP status codes:
200 OK
: Request successful201 Created
: Document uploaded successfully400 Bad Request
: Invalid request parameters or unsupported file type401 Unauthorized
: Invalid or missing API key404 Not Found
: Workflow or document not found413 Payload Too Large
: File size exceeds limit500 Internal Server Error
: Server-side error
Common Error Scenarios
File Upload Issues
- Unsupported file type
- File too large (>20MB)
- Corrupted file
Processing Errors
- Document processing timeout
- Unreadable content
- Processing failure
Field and Table Header Issues
- Invalid field or table header names (non-alphanumeric characters)
- Duplicate field names within a workflow
- Duplicate table header names within a workflow
For detailed error handling, refer to the Error Handling Guide.
Best Practices
File Upload
- Use async mode for large files or batch processing
- Include relevant metadata for better tracking
- Validate file types before upload
Result Processing
- Check confidence scores before using extracted data
- Handle both sync and async responses appropriately
- Implement retry logic for failed processing
Resource Management
- Delete processed documents when no longer needed
- Monitor storage usage
- Implement document retention policies
For more best practices, refer to the Best Practices Guide.