To optimize duplicate detection review process for Ontellus, The Bauen Group implemented a multi-step process. When documents are received, a check is first made for obvious duplicate markers, such as identical filenames, types, sizes, and digital fingerprint. The file is uploaded to Azure Blob Storage, which is also far less costly than storing it within Dataverse directly.
After confirming no exact matches are found based on the file info, the document is coded, classifying it by type of record (invoice, statement, requested record, etc.), relevant account, etc., after which the system checks for existing documents matching the coded criteria.
For each potential duplicate found, a request is created for a human to verify before the document is processed further.
Each review is processed quickly by displaying the records side by side. For multi-page documents, the pages can be scrolled simultaneously to check corresponding pages.
In this example, the user quickly ascertains the records are not duplicates within seconds – the latter is an invoice paid confirmation. Likely this near-duplicate may have been improperly classified, whether by human error or by OCR technology being unable to process a less than ideal image.