How to Build an iOS Document Scanner

Dec 01, 2022

Dynamsoft Document Normalizer is an SDK to detect the boundary of documents and runs perspective transformation to get a normalized document image.

In this article, we are going to build an iOS document scanner using Dynamsoft Document Normalizer and AVFoundation.

A demo video of the final result:

The scanning process:

Start the camera using AVFoundation and analyse the frames to detect the boundary of documents. When the IoUs of three consecutive detected polygons are over 90%, take a photo.
After the photo is taken, the users are directed to a cropping page. They can drag the vertices to adjust the detected polygons.
If the user confirms that the polygon is correct, the app then runs perspective correction and cropping to get a normalized document image.

Build an iOS Document Scanner

Let’s do this in steps.

New Project

Open Xcode to create a new UIKit project in Swift.

Since we are going to design the UIs programmatically, we can just delete Main.storyboard and SceneDelegate.swift and update the project and Info.plist to remove relevant info.

Add Dependencies

Here, we use CocoaPods to manage dependencies.

Initialize a pod file in the project folder.
```
pod init
```

Add the following line to the Podfile.

pod 'DynamsoftCaptureVisionRouter','2.0.21'
pod 'DynamsoftDocumentNormalizer','2.0.20'
pod 'DynamsoftCore','3.0.20'
pod 'DynamsoftLicense','3.0.30'
pod 'DynamsoftImageProcessing','2.0.21'

Run pod install

Add Permissions

We have to add the following to Info.plist for the permissions to access the camera and write files to the photo library.

<key>NSCameraUsageDescription</key>
<string>For camera usage</string>
<key>NSPhotoLibraryAddUsageDescription</key>
<string>For saving photos</string> 

Initialize the License of Dynamsoft Document Normalizer

In AppDelegate.swift, add the following to initialize the license of Dynamsoft Document Normalizer. You can apply for a 30-day trial license here.

@main
class AppDelegate: UIResponder, UIApplicationDelegate, LicenseVerificationListener  {
    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        // Override point for customization after application launch.
        let trialLicense = "LICENSE-KEY"; 
        LicenseManager.initLicense(trialLicense, verificationDelegate: self)
        return true
    }
    
    func onLicenseVerified(_ isSuccess: Bool, error: Error?) {
        print(error?.localizedDescription ?? "license error")
    }
}

Next, we are going to implement the pages in steps. There are four pages: home page, camera page, cropper page and result viewer page.

Home Page

Add a Scan Document button to navigate to the camera page.

class ViewController: UIViewController {
    var button: UIButton!
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
        
        self.button = UIButton(frame: .zero)
        self.button.setTitle("Scan Document", for: .normal)
        self.button.setTitleColor(.systemBlue, for: .normal)
        self.button.setTitleColor(.lightGray, for: .highlighted)

        self.button.addTarget(self,
                         action: #selector(buttonAction),
                         for: .touchUpInside)
        self.navigationItem.title = "Home"
        
        self.view.backgroundColor = UIColor.white
        self.view.addSubview(self.button)
    }

    override func viewDidLayoutSubviews() {
        super.viewDidLayoutSubviews()
        if let button = self.button {
            let width: CGFloat = 300
            let height: CGFloat = 50
            let x = view.frame.width/2 - width/2
            let y = view.frame.height - 100
            button.frame = CGRect.init(x: x, y: y, width: width, height: height)
        }
    }
    
    @objc
    func buttonAction() {
        self.navigationController?.pushViewController(CameraController(), animated: true)
    }
}

Since we are not using storyboard, we have to display the home page programmatically in AppDelegate.swift with the following code.

@main
class AppDelegate: UIResponder, UIApplicationDelegate, LicenseVerificationListener  {
    var window: UIWindow?
    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        // Override point for customization after application launch.
        window = UIWindow(frame: UIScreen.main.bounds)
        let vc = ViewController()
        let navController = UINavigationController(rootViewController: vc)
        window?.rootViewController = navController
        window?.makeKeyAndVisible()
        //...
        return true
    }
}

Camera Page

Create a new camera view controller named CameraController.swift. Then, we are going to start the camera and detect documents in this page.

Start the Camera using AVFoundation

Create a new view named PreviewView to display the camera preview.

import UIKit
   
//https://developer.apple.com/documentation/avfoundation/capture_setup/setting_up_a_capture_session?language=objc#2958852
class PreviewView: UIView {
    override class var layerClass: AnyClass {
        return AVCaptureVideoPreviewLayer.self
    }
    /// Convenience wrapper to get layer as its statically known type.
    var videoPreviewLayer: AVCaptureVideoPreviewLayer {
        return layer as! AVCaptureVideoPreviewLayer
    }
}

Add the preview view to the camera controller.

var previewView: PreviewView!
override func viewDidLoad() {
    super.viewDidLoad()
    self.previewView = PreviewView()
    self.view.addSubview(self.previewView)
}

override func viewDidLayoutSubviews() {
    super.viewDidLayoutSubviews()
    if let previewView = self.previewView {
        let width: CGFloat = view.frame.width
        let height: CGFloat = view.frame.height
        let x: CGFloat = 0.0
        let y: CGFloat = 0.0
        previewView.frame = CGRect.init(x: x, y: y, width: width, height: height)
    }
}

Open the camera and start the capture session.

var captureSession: AVCaptureSession!
func startCamera(){
    // Create the capture session.
    self.captureSession = AVCaptureSession()

    // Find the default video device.
    guard let videoDevice = AVCaptureDevice.default(for: .video) else { return }
       
    do {
        // Wrap the video device in a capture device input.
        let videoInput = try AVCaptureDeviceInput(device: videoDevice)
        // If the input can be added, add it to the session.
        if self.captureSession.canAddInput(videoInput) {
            self.captureSession.addInput(videoInput)
            self.previewView.videoPreviewLayer.session = self.captureSession
               
            //set the camera preview's resolution
            self.captureSession.sessionPreset = AVCaptureSession.Preset.hd1920x1080
               
            self.captureSession.startRunning()
        }
    } catch {
        // Configuration failed. Handle error.
    }
}

Detect Documents from the Camera Preview

Import libraries.

import DynamsoftCore
import DynamsoftCaptureVisionRouter
import DynamsoftDocumentNormalizer

Create a Capture Vision Router instance which can call Document Normalizer.
```
var cvr:CaptureVisionRouter = CaptureVisionRouter()
```

Update the settings of the Capture Vision Router to add several templtas for document detection and normalization.

func loadTemplate(){
    try? cvr.initSettings("{\"CaptureVisionTemplates\": [{\"Name\": \"Default\"},{\"Name\": \"DetectDocumentBoundaries_Default\",\"ImageROIProcessingNameArray\": [\"roi-detect-document-boundaries\"]},{\"Name\": \"DetectAndNormalizeDocument_Default\",\"ImageROIProcessingNameArray\": [\"roi-detect-and-normalize-document\"]},{\"Name\": \"NormalizeDocument_Binary\",\"ImageROIProcessingNameArray\": [\"roi-normalize-document-binary\"]},  {\"Name\": \"NormalizeDocument_Gray\",\"ImageROIProcessingNameArray\": [\"roi-normalize-document-gray\"]},  {\"Name\": \"NormalizeDocument_Color\",\"ImageROIProcessingNameArray\": [\"roi-normalize-document-color\"]}],\"TargetROIDefOptions\": [{\"Name\": \"roi-detect-document-boundaries\",\"TaskSettingNameArray\": [\"task-detect-document-boundaries\"]},{\"Name\": \"roi-detect-and-normalize-document\",\"TaskSettingNameArray\": [\"task-detect-and-normalize-document\"]},{\"Name\": \"roi-normalize-document-binary\",\"TaskSettingNameArray\": [\"task-normalize-document-binary\"]},  {\"Name\": \"roi-normalize-document-gray\",\"TaskSettingNameArray\": [\"task-normalize-document-gray\"]},  {\"Name\": \"roi-normalize-document-color\",\"TaskSettingNameArray\": [\"task-normalize-document-color\"]}],\"DocumentNormalizerTaskSettingOptions\": [{\"Name\": \"task-detect-and-normalize-document\",\"SectionImageParameterArray\": [{\"Section\": \"ST_REGION_PREDETECTION\",\"ImageParameterName\": \"ip-detect-and-normalize\"},{\"Section\": \"ST_DOCUMENT_DETECTION\",\"ImageParameterName\": \"ip-detect-and-normalize\"},{\"Section\": \"ST_DOCUMENT_NORMALIZATION\",\"ImageParameterName\": \"ip-detect-and-normalize\"}]},{\"Name\": \"task-detect-document-boundaries\",\"TerminateSetting\": {\"Section\": \"ST_DOCUMENT_DETECTION\"},\"SectionImageParameterArray\": [{\"Section\": \"ST_REGION_PREDETECTION\",\"ImageParameterName\": \"ip-detect\"},{\"Section\": \"ST_DOCUMENT_DETECTION\",\"ImageParameterName\": \"ip-detect\"},{\"Section\": \"ST_DOCUMENT_NORMALIZATION\",\"ImageParameterName\": \"ip-detect\"}]},{\"Name\": \"task-normalize-document-binary\",\"StartSection\": \"ST_DOCUMENT_NORMALIZATION\",   \"ColourMode\": \"ICM_BINARY\",\"SectionImageParameterArray\": [{\"Section\": \"ST_REGION_PREDETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_DETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_NORMALIZATION\",\"ImageParameterName\": \"ip-normalize\"}]},  {\"Name\": \"task-normalize-document-gray\",   \"ColourMode\": \"ICM_GRAYSCALE\",\"StartSection\": \"ST_DOCUMENT_NORMALIZATION\",\"SectionImageParameterArray\": [{\"Section\": \"ST_REGION_PREDETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_DETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_NORMALIZATION\",\"ImageParameterName\": \"ip-normalize\"}]},  {\"Name\": \"task-normalize-document-color\",   \"ColourMode\": \"ICM_COLOUR\",\"StartSection\": \"ST_DOCUMENT_NORMALIZATION\",\"SectionImageParameterArray\": [{\"Section\": \"ST_REGION_PREDETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_DETECTION\",\"ImageParameterName\": \"ip-normalize\"},{\"Section\": \"ST_DOCUMENT_NORMALIZATION\",\"ImageParameterName\": \"ip-normalize\"}]}],\"ImageParameterOptions\": [{\"Name\": \"ip-detect-and-normalize\",\"BinarizationModes\": [{\"Mode\": \"BM_LOCAL_BLOCK\",\"BlockSizeX\": 0,\"BlockSizeY\": 0,\"EnableFillBinaryVacancy\": 0}],\"TextDetectionMode\": {\"Mode\": \"TTDM_WORD\",\"Direction\": \"HORIZONTAL\",\"Sensitivity\": 7}},{\"Name\": \"ip-detect\",\"BinarizationModes\": [{\"Mode\": \"BM_LOCAL_BLOCK\",\"BlockSizeX\": 0,\"BlockSizeY\": 0,\"EnableFillBinaryVacancy\": 0,\"ThresholdCompensation\" : 7}],\"TextDetectionMode\": {\"Mode\": \"TTDM_WORD\",\"Direction\": \"HORIZONTAL\",\"Sensitivity\": 7},\"ScaleDownThreshold\" : 512},{\"Name\": \"ip-normalize\",\"BinarizationModes\": [{\"Mode\": \"BM_LOCAL_BLOCK\",\"BlockSizeX\": 0,\"BlockSizeY\": 0,\"EnableFillBinaryVacancy\": 0}],\"TextDetectionMode\": {\"Mode\": \"TTDM_WORD\",\"Direction\": \"HORIZONTAL\",\"Sensitivity\": 7}}]}")
}

Add a video output so that we can get the data of the camera preview.

func startCamera(){
    // Create the capture session.
    self.captureSession = AVCaptureSession()

    // Find the default video device.
    guard let videoDevice = AVCaptureDevice.default(for: .video) else { return }
       
    do {
        // Wrap the video device in a capture device input.
        let videoInput = try AVCaptureDeviceInput(device: videoDevice)
        // If the input can be added, add it to the session.
        if self.captureSession.canAddInput(videoInput) {
            self.captureSession.addInput(videoInput)
            self.previewView.videoPreviewLayer.session = self.captureSession
               
            //set the camera preview's resolution
            self.captureSession.sessionPreset = AVCaptureSession.Preset.hd1920x1080
               
+           self.videoOutput = AVCaptureVideoDataOutput.init()
+           if self.captureSession.canAddOutput(self.videoOutput) {
+               self.captureSession.addOutput(videoOutput)
+           }
+           
+           var queue:DispatchQueue
+           queue = DispatchQueue(label: "queue")
+           self.videoOutput.setSampleBufferDelegate(self as AVCaptureVideoDataOutputSampleBufferDelegate, queue: queue)
+           self.videoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey : kCVPixelFormatType_32BGRA] as [String : Any]
               
            self.captureSession.startRunning()
        }
    } catch {
        // Configuration failed. Handle error.
    }
}
   
+ func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection){
+    print("capture output")
+ }

Convert CMSampleBuffer to iImageData and then use Dynamsoft Document Normalizer to detect documents from the data.

//convert CMSampleBuffer to iImageData
let imageBuffer:CVImageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(imageBuffer, .readOnly)
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let bufferSize = CVPixelBufferGetDataSize(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let bpr = CVPixelBufferGetBytesPerRow(imageBuffer)
CVPixelBufferUnlockBaseAddress(imageBuffer, .readOnly)
let buffer = Data(bytes: baseAddress!, count: bufferSize)

let imageData = ImageData.init()
imageData.bytes = buffer
imageData.width = UInt(width)
imageData.height = UInt(height)
imageData.stride = UInt(bpr)
imageData.format = .ABGR8888
   
//detect documents from the data
 let capturedResult = cvr.captureFromBuffer(imageData, templateName: "DetectDocumentBoundaries_Default")

Draw the Detected Document Polygon

Create an Overlay view to draw the polygon.

class Overlay: UIView {
    var points:[CGPoint] = []
    override func draw(_ rect: CGRect) {
        if points.count == 4 {
            let aPath = UIBezierPath()
            aPath.move(to: points[0])
            aPath.addLine(to: points[1])
            aPath.move(to: points[1])
            aPath.addLine(to: points[2])
            aPath.move(to: points[2])
            aPath.addLine(to: points[3])
            aPath.move(to: points[3])
            aPath.addLine(to: points[0])

            // Keep using the method addLine until you get to the one where about to close the path
            aPath.close()

            // If you want to stroke it with a red color
            UIColor.red.set()
            aPath.lineWidth = 3
            aPath.stroke()
        }
    }
}

Add the overlay view and set its position and size just like the preview view.

override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view.
    self.overlay = Overlay()
    self.view.addSubview(self.overlay)
}
   
override func viewDidLayoutSubviews() {
    super.viewDidLayoutSubviews()
    if let overlay = self.overlay {
        let width: CGFloat = view.frame.width
        let height: CGFloat = view.frame.height
        let x: CGFloat = 0.0
        let y: CGFloat = 0.0
        overlay.backgroundColor = UIColor.init(red: 0, green: 0, blue: 0, alpha: 0)
        overlay.frame = CGRect.init(x: x, y: y, width: width, height: height)
    }
}

Pass the points data of the detected document to the overlay view.

let capturedResult = cvr.captureFromBuffer(imageData, templateName: "DetectDocumentBoundaries_Default")
let results = capturedResult.items
if results != nil {
    print(results?.count ?? 0)
    if results?.count ?? 0>0 {
        let result = results?[0] as! DetectedQuadResultItem
        DispatchQueue.main.async {
            var points = result.location.points as! [CGPoint]
            points = Utils.scaleAndRotatePoints(points, frameWidth: Double(width), frameHeight: Double(height), viewWidth: self.view.frame.width, viewHeight: self.view.frame.height)
            self.overlay.points = points
            self.overlay.setNeedsDisplay() //trigger the redrawing
        }
    }
}

We have to modify the points to correctly draw the polygon. First, we need to scale the points based on the view’s size and the camera preview image’s size. Second, if the phone is in portrait, while the camera preview is landscape, we need to rotate the coordinates of the points.

Here is the code:

class Utils {
    static func scaleAndRotatePoints(_ points:[CGPoint], frameWidth:Double, frameHeight:Double,viewWidth:Double, viewHeight:Double) -> [CGPoint]{
        var newPoints:[CGPoint] = []
        for point in points {
            var x = point.x
            var y = point.y

            let orientation = UIDevice.current.orientation
            if  orientation == .portrait || orientation == .unknown || orientation == .faceUp {
                x = frameHeight - point.y;
                y = point.x;
            } else if orientation == .landscapeRight {
                x = frameWidth - point.x;
                y = frameHeight - point.y;
            }

            x = x * xPercent(frameWidth:frameWidth,frameHeight:frameHeight,viewWidth:viewWidth,viewHeight:viewHeight)
            y = y * yPercent(frameWidth:frameWidth,frameHeight:frameHeight,viewWidth:viewWidth,viewHeight:viewHeight)
            let newPoint = CGPoint(x: x, y: y)
            newPoints.append(newPoint)
        }
        return newPoints
    }
       
    static func xPercent(frameWidth:Double, frameHeight:Double,viewWidth:Double, viewHeight:Double) -> Double {
        if (frameWidth>frameHeight && viewWidth>viewHeight) {
            return viewWidth/frameWidth
        }else{
            return viewWidth/frameHeight
        }
    }
       
    static func yPercent(frameWidth:Double, frameHeight:Double,viewWidth:Double, viewHeight:Double) -> Double {
        if (frameWidth>frameHeight && viewWidth>viewHeight) {
            return viewHeight/frameHeight
        }else{
            return viewHeight/frameWidth
        }
    }
}

Check if the Detected Document is Steady

We can check the IoUs (intersection of union) of three consecutive polygons to infer if the detected document is steady.

In the Utils class, add helper functions to calculate the IoU value.

static func intersectionOverUnion(pts1:[CGPoint] ,pts2:[CGPoint]) -> Double {
    let rect1 = getRectFromPoints(points:pts1);
    let rect2 = getRectFromPoints(points:pts2);
    return rectIntersectionOverUnion(rect1:rect1, rect2:rect2);
}

static func rectIntersectionOverUnion(rect1:CGRect, rect2:CGRect) -> Double {
    let leftColumnMax = max(rect1.minX, rect2.minX);
    let rightColumnMin = min(rect1.maxX,rect2.maxX);
    let upRowMax = max(rect1.minY, rect2.minY);
    let downRowMin = min(rect1.maxY,rect2.maxY);

    if (leftColumnMax>=rightColumnMin || downRowMin<=upRowMax){
      return 0;
    }

    let s1 = rect1.width*rect1.height;
    let s2 = rect2.width*rect2.height;
    let sCross = (downRowMin-upRowMax)*(rightColumnMin-leftColumnMax);
    return sCross/(s1+s2-sCross);
}

static func getRectFromPoints(points:[CGPoint]) -> CGRect {
    var minX,minY,maxX,maxY:CGFloat

    minX = points[0].x
    minY = points[0].y
    maxX = 0
    maxY = 0

    for point in points {
        minX = min(point.x,minX)
        minY = min(point.y,minY)
        maxX = max(point.x,maxX)
        maxY = max(point.y,maxY)
    }
       
    let r = CGRect(x: minX, y: minY, width: maxX-minX, height: maxY-minY)
    return r
}

Store detection results in an array. If the count of the array reaches 3, check if the detected document is steady.

let capturedResult = cvr.captureFromBuffer(imageData, templateName: "DetectDocumentBoundaries_Default")
let results = capturedResult.items
if results != nil {
    print(results?.count ?? 0)
    if results?.count ?? 0>0 {
        let result = results?[0] as! DetectedQuadResultItem
        if self.previousResults.count == 2 {
            self.previousResults.append(result)
            if steady() {
                print("steady")
                //do action
            }else{
                self.previousResults.remove(at: 0)
            }
        }else{
            self.previousResults.append(result)
        }
    }
}

The steady function which checks the IoUs.

func steady() -> Bool {
    let points1,points2,points3:[CGPoint]
    points1 = self.previousResults[0].location.points as! [CGPoint]
    points2 = self.previousResults[1].location.points as! [CGPoint]
    points3 = self.previousResults[2].location.points as! [CGPoint]
    let iou1 = Utils.intersectionOverUnion(pts1: points1,pts2: points2)
    let iou2 = Utils.intersectionOverUnion(pts1: points1,pts2: points3)
    let iou3 = Utils.intersectionOverUnion(pts1: points2,pts2: points3)
    if iou1>0.9 && iou2>0.9 && iou3>0.9 {
        return true
    }else{
        return false
    }
}

Take a Photo if the Detected Document is Steady

After the detected document is steady, we can take a high-resolution photo.

Add photo output to the capture session. Remember to enable high resolution capture so that we can take a photo with a resolution higher than the camera preview’s.

self.photoOutput = AVCapturePhotoOutput()
self.photoOutput.isHighResolutionCaptureEnabled = true
if self.captureSession.canAddOutput(self.photoOutput) {
    self.captureSession.addOutput(photoOutput)
}

Take a photo if the detected document is steady and then navigate to the cropper page.

func takePhoto(){
    let photoSettings: AVCapturePhotoSettings
    photoSettings = AVCapturePhotoSettings()
    photoSettings.isHighResolutionPhotoEnabled = true
    self.photoOutput.capturePhoto(with: photoSettings, delegate: self)
}
   
func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
    if let error = error {
        print("Error:", error)
    } else {
        self.captureSession.stopRunning() //stop the camera if the photo is taken.
        if let imageData = photo.fileDataRepresentation() {
            let image = UIImage(data: imageData)
            navigateToCropper(image!)
        }
    }
}
   
func navigateToCropper(_ image:UIImage){
    let controller = CroppingController()
    //pass the photo taken and the instance of Document Normalizer for further use
    controller.image = image
    controller.cvr = self.cvr
    navigationController?.pushViewController(controller, animated: true)
}

Cropper Page

In the cropper page, we can adjust the detected document polygon.

Rectify the orientation of the image taken.

The image taken in portrait will have an image orientation value so that it can be displayed in portrait like in a UIImageView, while its inner data is still in landscape. We need to rectify the orientation with the following code.

static func normalizedImage(_ image:UIImage) -> UIImage {
    if image.imageOrientation == UIImage.Orientation.up {
        return image
    }
    UIGraphicsBeginImageContextWithOptions(image.size, false, image.scale)
    image.draw(in: CGRect(x:0,y:0,width:image.size.width,height:image.size.height))
    let normalized = UIGraphicsGetImageFromCurrentImageContext()!
    UIGraphicsEndImageContext();
    return normalized
}

Display the image taken in a fullscreen size UIImageView.

class CroppingController: UIViewController {
    var imageView: UIImageView!
    override func viewDidLoad() {
         super.viewDidLoad()
         // Do any additional setup after loading the view.
         self.image = Utils.normalizedImage(self.image)
         self.imageView = UIImageView(frame: .zero)
         self.imageView.image = self.image
         self.view.addSubview(self.imageView)
    }
}
   
override func viewDidLayoutSubviews() {
    super.viewDidLayoutSubviews()
    if let imageView = self.imageView {
        let width: CGFloat = view.frame.width
        let height: CGFloat = view.frame.height
        let x: CGFloat = 0.0
        let y: CGFloat = 0.0
        imageView.frame = CGRect.init(x: x, y: y, width: width, height: height)
    }
}

Detect the document in the image taken and draw the overlay.

Similar to what we’ve done in the camera page, detect the document and draw the overlay. Since we are using the photo with the orientation rectified, we only need to scale the points for the overlay view.

func detect(){
    if let cvr = self.cvr {
        let capturedResult = cvr.captureFromImage(self.image,templateName: "DetectDocumentBoundaries_Default")
        let results = capturedResult.items
        if results?.count ?? 0 > 0 {
            let result = results?[0] as! DetectedQuadResultItem
            self.points = result.location.points as? [CGPoint]
            let CGPoints = Utils.scalePoints(self.points, xPercent: self.view.frame.width/self.image.size.width, yPercent: self.view.frame.height/self.image.size.height)
            showVertices(CGPoints)
            self.overlay.points = CGPoints
            self.overlay.setNeedsDisplay()
        }
    }
}

Define a Vertice class for adjusting the polygon.

import UIKit
class Vertice: UIView {
    var lineWidth = 3.0
    override func draw(_ rect: CGRect) {
        let h = rect.height
        let w = rect.width
        let color:UIColor = UIColor.red
        let drect = CGRect(x: 0, y: 0, width: w, height: h)
        let bpath:UIBezierPath = UIBezierPath(rect: drect)
        color.set()
        bpath.lineWidth = lineWidth
        bpath.stroke()
    }
}

Add four vertices.

var vertices:[Vertice] = []
func showVertices(_ CGPoints:[CGPoint]){
    let verticeSize = 24.0
    var index = 0
    for point in CGPoints {
        let vertice = Vertice()
        self.view.addSubview(vertice)
        let tapGesture = UITapGestureRecognizer(target: self, action:  #selector (self.tapActionForVertice (_:)))
        vertice.addGestureRecognizer(tapGesture)
        vertice.backgroundColor = UIColor.init(red: 255, green: 0, blue: 0, alpha: 0.5)
        let x = point.x + getOffsetX(index: index, size: verticeSize)
        let y = point.y + getOffsetY(index: index, size: verticeSize)
        vertice.frame = CGRect.init(x: x, y: y, width: verticeSize, height: verticeSize)
        vertices.append(vertice)
        index = index + 1
    }
}
   
func getOffsetX(index:Int, size:Double) -> Double {
    if index == 0 {
        return -size
    }else if index == 1 {
        return 0
    }else if index == 2 {
        return 0
    }else {
        return -size
    }
}
   
func getOffsetY(index:Int, size:Double) -> Double {
    if index == 0 {
        return -size
    }else if index == 1 {
        return -size
    }else if index == 2 {
        return 0
    }else {
        return 0
    }
}

The vertices will be displayed as the following.

Cropper

UITapGestureRecognizer is added to each vertice. If one of the vertices is tapped, set it as the selected vertice.

let tapGesture = UITapGestureRecognizer(target: self, action:  #selector (self.tapActionForVertice (_:)))
vertice.addGestureRecognizer(tapGesture)

The tap action:

@objc func tapActionForVertice(_ sender:UITapGestureRecognizer){
    self.selectedVertice = sender.view as! Vertice
    for vertice in vertices {
        if self.selectedVertice == vertice {
            vertice.lineWidth = 5
        }else{
            vertice.lineWidth = 3
        }
        vertice.setNeedsDisplay()
    }
}

Add a UIPanGestureRecognizer for the view.

let panGesture = UIPanGestureRecognizer(target: self, action:  #selector (self.panAction (_:)))        
self.view.addGestureRecognizer(panGesture)

Move the selected vertice and update the points data if the user pans the view.

var points:[CGPoint]!
var vertices:[Vertice] = []
var selectedVertice:Vertice!
var touchedX = -1.0
var touchedY = -1.0
var initialVerticeX = -1.0
var initialVerticeY = -1.0
@objc func panAction(_ sender:UIPanGestureRecognizer){
    if selectedVertice != nil {
        let point = sender.location(in: self.view)
        let translation = sender.translation(in: self.view)
        let pTouchedX = point.x - translation.x
        let pTouchedY = point.y - translation.y
        if pTouchedX != self.touchedX || pTouchedY != self.touchedY {
            self.touchedX = pTouchedX
            self.touchedY = pTouchedY
            self.initialVerticeX = selectedVertice.frame.minX
            self.initialVerticeY = selectedVertice.frame.minY
        }
        var x = self.initialVerticeX + translation.x
        var y = self.initialVerticeY + translation.y
        let width = selectedVertice.frame.width
        let height = selectedVertice.frame.height
        selectedVertice.frame = CGRect.init(x: x, y: y, width: width, height: height)
        let selectedIndex = vertices.firstIndex(of: selectedVertice)!
        x = x - getOffsetX(index: selectedIndex, size: 24)
        y = y - getOffsetY(index: selectedIndex, size: 24)
        updatePoints(newX:x,newY:y)
    }
}

func updatePoints(newX:Double,newY:Double) {
    if selectedVertice != nil {
        let selectedIndex = vertices.firstIndex(of: selectedVertice)!
        var point = self.points[selectedIndex]
        let xPercent = self.view.frame.width/self.image.size.width
        let yPercent = self.view.frame.height/self.image.size.height
        point.x = newX/xPercent
        point.y = newY/yPercent
        self.points[selectedIndex] = point
        var pointForView = self.overlay.points[selectedIndex]
        pointForView.x = newX
        pointForView.y = newY
        self.overlay.points[selectedIndex] = pointForView
        self.overlay.setNeedsDisplay()
    }
}

Add a toolbar with two buttons: a retake button to return to the camera page and an okay button to navigate to the result viewer page.

var toolbar:UIToolbar!
override func viewDidLoad() {
    self.toolbar = UIToolbar.init()
    let retakeButton = UIBarButtonItem.init(title: "Retake", style: .plain, target: self, action: #selector(retakeAction))
    let okayButton =  UIBarButtonItem.init(title: "Okay", style: .plain, target: self, action: #selector(okayAction))
    let flexibleSpace = UIBarButtonItem.flexibleSpace()
    self.toolbar.items = [retakeButton,flexibleSpace,okayButton]
    self.view.addSubview(self.toolbar)
}

override func viewDidLayoutSubviews() {
    super.viewDidLayoutSubviews()
    if let toolbar = self.toolbar {
        let width: CGFloat = view.frame.width
        let height: CGFloat = 32
        let x: CGFloat = 0.0
        let y: CGFloat = view.frame.height - 32
        toolbar.frame = CGRect.init(x: x, y: y, width: width, height: height)
    }
}
   
@objc
func retakeAction() {
    self.navigationController?.popViewController(animated: true)
}
   
@objc
func okayAction() {
    let vc = ResultViewerController()
    vc.points = self.points
    vc.ddn = self.ddn
    vc.image = self.image
    self.navigationController?.pushViewController(vc, animated:true)
}

Result Viewer Page

In the result viewer page, get the normalized image and display it in an UIImageView and add a save button to save the image to the photo library.

import UIKit
import DynamsoftCore
import DynamsoftCaptureVisionRouter
import DynamsoftDocumentNormalizer

class ResultViewerController: UIViewController {
    var imageView: UIImageView!
    var image: UIImage!
    var cvr:CaptureVisionRouter!
    var points:[CGPoint]!
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
        self.view.backgroundColor = UIColor.white
        self.imageView = UIImageView(frame: .zero)
        self.imageView.contentMode = UIView.ContentMode.scaleAspectFit
        self.view.addSubview(self.imageView)
        self.navigationItem.rightBarButtonItem = UIBarButtonItem(title: "Save",
                                                                 style: .plain,
                                                                 target: self,
                                                                 action: #selector(saveAction))
        normalize()
    }
    
    override func viewDidAppear(_ animated: Bool) {
        super.viewDidAppear(animated)
        
    }
    
    func normalize(){
        let quad = Quadrilateral.init(pointArray: points)
        let settings = try? cvr.getSimplifiedSettings("NormalizeDocument_Binary")
        settings?.roi = quad
        settings?.roiMeasuredInPercentage = false
        try? cvr.updateSettings("NormalizeDocument_Binary", settings: settings!)
        let capturedResult = cvr.captureFromImage(self.image, templateName: "NormalizeDocument_Binary")
        let results = capturedResult.items
        if results != nil {
            if results?.count ?? 0 > 0 {
                let normalizedResult = results?[0] as! NormalizedImageResultItem
                let normazliedImage = try? normalizedResult.imageData?.toUIImage()
                self.imageView.image = normazliedImage
            }
        }
        
        
    }

    override func viewDidLayoutSubviews() {
        super.viewDidLayoutSubviews()
        if let imageView = self.imageView {
            let width: CGFloat = self.view.frame.width
            let height: CGFloat = self.view.frame.height
            let x = 0.0
            let y = 0.0
            imageView.frame = CGRect.init(x: x, y: y, width: width, height: height)
        }
    }
    
    @objc
    func saveAction(){
        print("save")
        UIImageWriteToSavedPhotosAlbum(self.imageView.image!, self, #selector(saved(_:didFinishSavingWithError:contextInfo:)),nil)
    }
    
    @objc func saved(_ image: UIImage, didFinishSavingWithError error: NSError?, contextInfo: UnsafeRawPointer) {
        if let error = error {
            // we got back an error!
            let ac = UIAlertController(title: "Save error", message: error.localizedDescription, preferredStyle: .alert)
            ac.addAction(UIAlertAction(title: "OK", style: .default))
            present(ac, animated: true)
        } else {
            let ac = UIAlertController(title: "Saved!", message: "The image has been saved to your photos.", preferredStyle: .alert)
            ac.addAction(UIAlertAction(title: "OK", style: .default))
            present(ac, animated: true)
        }
    }
}

Source Code

Get the source code of the demo to have a try: https://github.com/tony-xlh/iOS-Document-Scanner