Skip to main content

Introduction

Welcome to the Nanonets API documentation. This guide will help you integrate Nanonets into your applications.

What is Nanonets?

Nanonets provides an AI-driven Intelligent Document Processing API that transforms unstructured documents into structured data. Our advanced OCR and document data extraction capabilities enable you to:

  • Extract structured data from various document types (invoices, receipts, forms, etc.)
  • Convert unstructured text into organized, machine-readable formats
  • Process and analyze document content with high accuracy
  • Automate document workflows and data entry tasks

Key Features

  • Advanced OCR & Data Extraction: Extract text, fields, and tables from documents with high accuracy
  • Unstructured to Structured Data: Transform raw document content into organized, structured formats
  • Workflow Automation: Approve or reject extracted results and assign files for review
  • External Integrations: Seamlessly import documents from various sources and export data to business applications

Getting Started

This guide will help you get started with the Nanonets API quickly.

Prerequisites

  1. A Nanonets account
  2. An API key (get it from http://app.nanonets.com/#/keys)
  3. Basic knowledge of REST APIs
  4. Your preferred programming language (Python, JavaScript, etc.)

Quick Start with the Nanonets SDK

1. Create Instant Learning Workflow

from nanonets import Nanonets

# Initialize client
client = Nanonets(api_key='your_api_key')

# Create instant learning workflow
workflow = client.workflows.create(
name="Custom Document Workflow",
description="Extract data from custom documents",
workflow_type="" # Empty string for instant learning workflow
)

2. Configure Fields and Tables to Extract

# Configure fields to extract
workflow.configure_fields([
{
"name": "Invoice Number",
"type": "text"
},
{
"name": "Total Amount",
"type": "number"
},
{
"name": "Invoice Date",
"type": "date"
}
])

# Configure table headers
workflow.configure_table_headers([
{
"name": "Item Description",
"type": "text"
},
{
"name": "Quantity",
"type": "number"
},
{
"name": "Unit Price",
"type": "number"
}
])

3. Process Document

# Process a document
result = workflow.process_document(
file_path="invoice.pdf",
async_mode=True
)

# Get results
if result.status == "completed":
# Access extracted fields with error handling
try:
invoice_number = result.data['fields']['Invoice Number'][0]['value']
except (KeyError, IndexError):
invoice_number = None
print("Invoice Number not found in the document")

try:
total_amount = result.data['fields']['Total Amount'][0]['value']
except (KeyError, IndexError):
total_amount = None
print("Total Amount not found in the document")

try:
invoice_date = result.data['fields']['Invoice Date'][0]['value']
except (KeyError, IndexError):
invoice_date = None
print("Invoice Date not found in the document")

# Access extracted tables with error handling
try:
for table in result.data['tables']:
for cell in table['cells']:
print(f"Row {cell['row']}, Col {cell['col']}: {cell['text']}")
except (KeyError, AttributeError):
print("No tables found in the document")

Quick Start with the REST API

1. Create Instant Learning Workflow

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
url = "https://app.nanonets.com/api/v4/workflows"

# Create instant learning workflow
payload = {
"name": "Custom Document Workflow",
"description": "Extract data from custom documents",
"workflow_type": "" # Empty string for instant learning workflow
}

response = requests.post(url, json=payload, auth=HTTPBasicAuth(API_KEY, ''))
workflow = response.json()
print(f"Created workflow with ID: {workflow['id']}")

2. Configure Fields and Tables to Extract

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = workflow['id'] # From previous step
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/fields"

# Configure fields to extract
fields_payload = {
"fields": [
{
"name": "Invoice Number",
"type": "text"
},
{
"name": "Total Amount",
"type": "number"
},
{
"name": "Invoice Date",
"type": "date"
}
],
"tables": [
{
"name": "Line Items",
"headers": [
{
"name": "Item Description",
"type": "text"
},
{
"name": "Quantity",
"type": "number"
},
{
"name": "Unit Price",
"type": "number"
}
]
}
]
}

response = requests.put(url, json=fields_payload, auth=HTTPBasicAuth(API_KEY, ''))
print("Fields and tables configured successfully")

3. Process Document

import requests
from requests.auth import HTTPBasicAuth

API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = workflow['id'] # From previous step
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"

# Process a document
files = {'file': open('invoice.pdf', 'rb')}
response = requests.post(url, files=files, auth=HTTPBasicAuth(API_KEY, ''))
result = response.json()

# Get results
if result['status'] == 'completed':
# Access extracted fields
invoice_number = result['data']['fields']['Invoice Number'][0]['value']
total_amount = result['data']['fields']['Total Amount'][0]['value']
invoice_date = result['data']['fields']['Invoice Date'][0]['value']
print(f"Invoice Number: {invoice_number}")
print(f"Total Amount: {total_amount}")
print(f"Invoice Date: {invoice_date}")

# Access extracted tables
for table in result['data']['tables']:
print(f"\nTable: {table['name']}")
for cell in table['cells']:
print(f"Row {cell['row']}, Col {cell['col']}: {cell['text']}")

Best Practices

  1. Error Handling

    • Always check response status codes
    • Implement retry logic for rate limits
  2. Security

    • Store API keys securely
    • Use environment variables
  3. Performance

    • Use async processing for large files
    • Monitor API usage

Next Steps