Project 1: Smart Document Scanner with OCR

A mobile-scanner-like app that digitizes documents with automatic edge detection and text extraction.

What You’ll Learn

Edge Detection - Using Canny algorithm to find document edges
Contour Detection - Finding and filtering contours to locate documents
Perspective Transformation - Converting tilted documents to flat, top-down view
Image Enhancement - Adaptive thresholding for clean output
OCR Integration - Extracting text from scanned documents

Key OpenCV Functions

Function	Purpose
`cv2.Canny()`	Detect edges in image
`cv2.findContours()`	Find contours from edges
`cv2.approxPolyDP()`	Approximate contour to polygon
`cv2.getPerspectiveTransform()`	Compute transformation matrix
`cv2.warpPerspective()`	Apply perspective transform
`cv2.adaptiveThreshold()`	Enhance document for reading

Usage

# Run demo with generated sample image
python main.py --demo

# Scan a specific image
python main.py --image /path/to/document.jpg

# Use webcam for real-time scanning
python main.py --camera

# Show debug visualization
python main.py --demo --debug

Algorithm Steps

1. Load Image
     |
2. Convert to Grayscale
     |
3. Apply Gaussian Blur
     |
4. Canny Edge Detection
     |
5. Find Contours
     |
6. Filter for 4-point contours
     |
7. Order corner points
     |
8. Apply Perspective Transform
     |
9. Enhance with Adaptive Threshold
     |
10. Extract Text (OCR)

Real-World Applications

Mobile scanning apps (CamScanner, Adobe Scan)
Document digitization systems
Receipt scanning for expense tracking
ID/passport scanning
Whiteboard capture

Code Highlights

Finding Document Corners

# Approximate contour to polygon
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)

# If 4 points, it's likely a document
if len(approx) == 4:
    doc_contour = approx

Perspective Transform

# Define destination points (top-down view)
dst = np.array([[0, 0], [width, 0], [width, height], [0, height]])

# Get transformation matrix and apply
M = cv2.getPerspectiveTransform(src_pts, dst)
warped = cv2.warpPerspective(image, M, (width, height))

Project 1: Smart Document Scanner with OCR

What You’ll Learn

Key OpenCV Functions

Usage

Algorithm Steps

Real-World Applications

Code Highlights

Finding Document Corners

Perspective Transform

References