Project 1: Smart Document Scanner with OCR
A mobile-scanner-like app that digitizes documents with automatic edge detection and text extraction.
What You’ll Learn
- Edge Detection - Using Canny algorithm to find document edges
- Contour Detection - Finding and filtering contours to locate documents
- Perspective Transformation - Converting tilted documents to flat, top-down view
- Image Enhancement - Adaptive thresholding for clean output
- OCR Integration - Extracting text from scanned documents
Key OpenCV Functions
| Function | Purpose |
|---|---|
cv2.Canny() |
Detect edges in image |
cv2.findContours() |
Find contours from edges |
cv2.approxPolyDP() |
Approximate contour to polygon |
cv2.getPerspectiveTransform() |
Compute transformation matrix |
cv2.warpPerspective() |
Apply perspective transform |
cv2.adaptiveThreshold() |
Enhance document for reading |
Usage
# Run demo with generated sample image
python main.py --demo
# Scan a specific image
python main.py --image /path/to/document.jpg
# Use webcam for real-time scanning
python main.py --camera
# Show debug visualization
python main.py --demo --debug
Algorithm Steps
1. Load Image
|
2. Convert to Grayscale
|
3. Apply Gaussian Blur
|
4. Canny Edge Detection
|
5. Find Contours
|
6. Filter for 4-point contours
|
7. Order corner points
|
8. Apply Perspective Transform
|
9. Enhance with Adaptive Threshold
|
10. Extract Text (OCR)
Real-World Applications
- Mobile scanning apps (CamScanner, Adobe Scan)
- Document digitization systems
- Receipt scanning for expense tracking
- ID/passport scanning
- Whiteboard capture
Code Highlights
Finding Document Corners
# Approximate contour to polygon
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
# If 4 points, it's likely a document
if len(approx) == 4:
doc_contour = approx
Perspective Transform
# Define destination points (top-down view)
dst = np.array([[0, 0], [width, 0], [width, height], [0, height]])
# Get transformation matrix and apply
M = cv2.getPerspectiveTransform(src_pts, dst)
warped = cv2.warpPerspective(image, M, (width, height))