
Optical Character Recognition (OCR) is one of the most practical applications of computer vision. Using Python and OpenCV, you can build a document scanner that captures, deskews, and extracts text from physical documents — no expensive hardware required.
What You’ll Need
- Python 3.8+
- OpenCV (
pip install opencv-python) - NumPy
- Tesseract OCR + pytesseract (
pip install pytesseract)
Step 1: Capture and Preprocess the Image
Load the image with cv2.imread(), convert to grayscale, and apply Gaussian blur to reduce noise. Then use Canny edge detection (cv2.Canny()) to find the document borders.
Step 2: Find the Document Contour
Use cv2.findContours() to detect the largest quadrilateral in the image — this is typically the document. Apply cv2.approxPolyDP() to simplify the contour to four corner points.
Step 3: Perspective Transform (Deskew)
Once you have the four corners, apply a perspective warp using cv2.getPerspectiveTransform() and cv2.warpPerspective(). This “flattens” the document, correcting any angle or distortion.
Step 4: Extract Text with Tesseract
Pass the cleaned image to pytesseract: text = pytesseract.image_to_string(warped_image). For Australian business documents, you can configure Tesseract’s language model to improve accuracy on common terms and formats.
Real-World Applications
- Automated invoice processing for accounting systems
- Digitising paper forms and contracts
- Extracting data from ID documents (with appropriate privacy controls)
- Batch scanning of printed reports into searchable databases
At Ozlin Info, we build production-grade Intelligent Document Processing (IDP) pipelines that take this further — integrating OCR with AI classification, data validation, and direct export to your business software. Get in touch to learn how we can automate your document workflows.