Blog

Build an OCR System Using Runpod Serverless

Learn how to build an OCR pipeline using Runpod Serverless and Hugging Face models. Great for processing receipts, invoices, and scanned documents at scale.

Build an OCR System Using Runpod Serverless

Learn how to build an Optical Character Recognition (OCR) system using Runpod Serverless and pre-trained models from Hugging Face to automate the processing of receipts and invoices.

Introduction

Processing receipts and invoices manually is both time-consuming and prone to errors. Optical Character Recognition (OCR) systems can automate this task by extracting text from images and converting it into structured data. In this tutorial, you will learn how to build your own OCR system using Runpod Serverless and pre-trained models from Hugging Face. This system will enable you to efficiently convert images of receipts into digital invoices, streamlining your workflow and reducing manual data entry.

Prerequisites

To complete this tutorial, you will need:

A Runpod account with access to Runpod Serverless.
Basic knowledge of Python programming.
Familiarity with RESTful APIs.
Python 3 installed on your local machine.
The following Python libraries installed:
pip install requests pillow pdf2image pillow_heif argparse

Step 1 — Setting Up the Runpod Serverless Environment

First, you'll set up a serverless endpoint on Runpod. Runpod Serverless allows you to deploy and run machine learning models without managing the underlying infrastructure.

Deploying the OCR Model

After deployment, you'll receive an Endpoint ID, which you'll use to interact with the model.

OPENAI BASE URL https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/openai/v1
RUNSYNC https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/runsync
RUN https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/run
STATUS https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/status/:id
CANCEL https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/cancel/:id
HEALTH https://api.runpod.ai/v2/vllm-xxxxxxxxxxx/health

Writing the `InvoiceProcessor` Class

Create a file named invoice_processor.py and add the following code snippets.

Importing Required Libraries`‍`

Converting Images to Base64

Takes in images and turns them into base64 encoded schemes that the model is able to ingeest and run inference on.

Processing the Invoice Image

Batch Processing of Invoices

Option to batch process multiple receipts into a single invoice.‍

Creating a Runpod API Key

Processing a Single Image

Run the following command to process a single image:

python run_processor.py --api-key "your-runpod-api-key" --endpoint-id "your-endpoint-id" --input path/to/invoice.jpg

Replace:

"your-runpod-api-key" with your actual Runpod API key.
"your-endpoint-id" with your endpoint ID from Runpod.
path/to/invoice.jpg with the path to your invoice image.

Processing a Batch of Images

To process all supported images in a directory:

python run_processor.py --api-key "your-runpod-api-key" --endpoint-id "your-endpoint-id" --input path/to/invoice_directory

Step 5 — Examining the Output

After running the script, you'll find JSON files in the ./output directory.

Example Output (invoice_processed.json):‍

Changing Serverless Template to New Model

If you want to change the model from hugging-faces you can update the model URL.

Step 6 — Generating Invoices from Extracted Data

Now, you can use the extracted JSON data to generate formatted invoices.

Generating a PDF Invoice

Use the ReportLab library to create a PDF invoice.

Installing ReportLab

pip install reportlab

Writing the PDF Generation Script

Create a file named generate_invoice.py and add the following code:‍

Running the PDF Generation Script

Run the following command:

python generate_invoice.py

This script will generate a PDF invoice based on the data extracted from your image.

Conclusion

In this tutorial, you built an OCR system using Runpod Serverless and a pre-trained model from Hugging Face. By automating the extraction of text from images and converting it into structured data, you've streamlined the process of generating invoices from receipts. This solution saves time and reduces the potential for errors associated with manual data entry.

Next Steps

Consider enhancing your OCR system by:

Improving Error Handling: Add more robust exception handling to manage edge cases.
Customizing Invoice Templates: Use advanced PDF generation libraries to create professional invoice layouts.
Integrating with Accounting Software: Connect your system to accounting platforms like QuickBooks or Xero for seamless workflow integration.

Additional Resources

By following this tutorial, you've gained hands-on experience in building an OCR system, processing images, and generating digital invoices using Runpod. This foundation opens up opportunities to explore more complex data extraction and document processing tasks.

‍

Deploy When Available is now GA

Queue for any GPU spec, even one that's fully rented out, and we'll deploy it the moment capacity opens up. No more refreshing the console or running a sniping tool.

The Chips Got Faster. The Stack Didn't.

Explore why faster chips have shifted the bottleneck to AI infrastructure, and what that means for teams running production workloads.

Multi-Instance GPUs on Runpod: Stop Paying for Compute You Don't Need

With MIG, we can partition RTX 6000 Pro cards into isolated 24 GB instances. Here's when it makes sense for your workloads.

Build what’s next.

Build, train, and scale AI workloads on Runpod with cloud GPUs, Serverless, and Clusters.

Get started

Build an OCR System Using Runpod Serverless

Introduction

Prerequisites