Building a Document Scanner with Python and OpenCV

Python OpenCV document scanner code on screen

Optical Character Recognition (OCR) is one of the most practical applications of computer vision. Using Python and OpenCV, you can build a document scanner that captures, deskews, and extracts text from physical documents — no expensive hardware required.

What You’ll Need

Python 3.8+
OpenCV (pip install opencv-python)
NumPy
Tesseract OCR + pytesseract (pip install pytesseract)

Step 1: Capture and Preprocess the Image

Load the image with cv2.imread(), convert to grayscale, and apply Gaussian blur to reduce noise. Then use Canny edge detection (cv2.Canny()) to find the document borders.

Step 2: Find the Document Contour

Use cv2.findContours() to detect the largest quadrilateral in the image — this is typically the document. Apply cv2.approxPolyDP() to simplify the contour to four corner points.

Step 3: Perspective Transform (Deskew)

Once you have the four corners, apply a perspective warp using cv2.getPerspectiveTransform() and cv2.warpPerspective(). This “flattens” the document, correcting any angle or distortion.

Step 4: Extract Text with Tesseract

Pass the cleaned image to pytesseract: text = pytesseract.image_to_string(warped_image). For Australian business documents, you can configure Tesseract’s language model to improve accuracy on common terms and formats.

Real-World Applications

Automated invoice processing for accounting systems
Digitising paper forms and contracts
Extracting data from ID documents (with appropriate privacy controls)
Batch scanning of printed reports into searchable databases

At Ozlin Info, we build production-grade Intelligent Document Processing (IDP) pipelines that take this further — integrating OCR with AI classification, data validation, and direct export to your business software. Get in touch to learn how we can automate your document workflows.