Editor's Note: This feature originally appeared in the May issue of MReport, out now .
Technological advances are providing an alternative to monotonous data entry in day-to-day processing requirements. One such advance that is changing the industry is Optical Character Recognition (OCR), which is most recognizable by its basic function that gives users the ability to convert a printed document back into searchable text. This process offers varying levels of sophistication and technological intelligence to document processing, yet unfortunately, this feature alone isn’t useful in the mortgage market, since quickly finding a data point is only part of the challenge faced when looking at a PDF mortgage file of 500 or more pages.
Structure must be brought to the documents before we can glean data points. This is done by ‘document classification’ a term we use to indicate the identifying of individual documents in the file. More sophisticated OCR solutions incorporate an artificial intelligence (AI) rules engine that can read the OCRed text and make decisions similar to the way a human would. In this way, the text scanned by an OCR process can be leveraged to automate the document classification effort and effectively bring order to the chaos. Once a working structure has been brought to the document, we can extract individual data elements for the relevant document types, i.e., the date from the note, the disbursement date from the closing disclosure, or the original loan amount from the deed of trust. OCR on its own can create the relevant text to generate these data elements, but without a robust AI element that can understand what that text means, the data remains locked in the documents.
Cutting Cost, Creating Convenience
With the costs to process each mortgage continuing to rise, lenders must leverage automation to improve profitability and consistency in their business processes. With advanced mortgage OCR solutions, companies have been able to reduce their level of manual document indexing and data entry activity, enabling them to process more loans per day, at a lower cost per loan, yielding a leaner process and increased profit margins. Today’s most advanced mortgage OCR solution does more than just convert document images to text; it is then processed by an AI rules engine in the same way a human being would process the content.
Based on these rules, documents are automatically indexed, and relevant data points are extracted. Downstream applications then receive this information for appropriate routing, decisioning, and archival. The process begins with a full-page OCR scan of each image, typically completed at a rate of less than one second per page. This high-speed performance allows every word on the page to be included in the scope of the AI rules engine analysis, just as a human eye would consider each line of data for consideration. This content evaluation process is unique to only the most advanced OCR solutions regarding the combination of speed and ability to include all page content in the evaluation scope. This makes the process extremely flexible with documents of varying layout, such as bank statements.
Unloading the Onboarding Burden
Traditionally, after transferring servicing rights to a new servicer, the process of loan on-boarding is labor intensive and expensive. A data file and scanned images of the loan are typically provided to the new servicer for ingestion into their business system, but the data file is not always correct as related to the documents of record, so an auditing process is needed to validate the data file. This process is cumbersome in that it can be difficult for the new servicer to identify the final version of key documents, as well as the sheer volume of data needed for comparison being quite onerous.
The Loan Onboarding Audit solution automatically reads every word of each document in the incoming portfolio and identifies the final version, which is then passed through AI data extraction logic to pull out the elements relevant for a servicing operation’s due diligence process and loan data archival needs. It is common to have loan ingestion rates of thousands per day with this approach.
A Non-‘TRID’itional Asset
The most advanced OCR solution provides a rigorous tool for a comprehensive review of each TILA-RESPA Integrated Disclosure (TRID) transaction. Typically, during the origination process, there are several iterations of both a loan estimate and a closing disclosure. Advanced TRID audit solutions extract every data element from all initial and re-disclosed loan estimates and closing disclosures, and the system can be configured to output either all of the data from each document iteration or just the differences found from the prior document. Output formats include MISMO v3.3 or custom XML schemas.
In the case where a loan origination system (LOS) is generating the TRID disclosures, this differential reporting may be something produced by the LOS itself. However, in the correspondent lending channel—or in the case of a split, “borrower-only” and “seller-only” closing disclosure transaction—this solution closes a gap that the LOS is unable to address. In these cases where the lender’s LOS does not generate all iterations of the closing disclosure and loan estimate, lenders need a technology that can natively read PDF or scanned TIFF versions of these documents. This TRID audit solution can support any layout of these documents from any source.
The Final Call
The Uniform Closing Dataset (UCD) provides a common industry dataset to support the Consumer Financial Protection Bureau’s closing disclosure. Loans closed on or after September 25, 2017, which are acquired by GSEs are required to have both a UCD XML file and, after April 2018, an embedded PDF of the associated borrower closing disclosure. Advanced OCR provides the tools to determine if the data on the embedded closing disclosure matches the same data in the UCD XML file.
While this capability is certainly valuable to GSE entities, it is also possible to use this audit for other loan transfers. As part of a due diligence process, investors may use this capability to verify that a set of loans to be purchased is as advertised, and all critical metadata provided is accurate. To promote compliance with federal consumer protection laws, lenders are required to submit specific borrower demographic data to the federal government. Home Mortgage Disclosure Act (HMDA) disclosures provide the public with information on the home mortgage activities of most lenders. One of the challenges for a lender in reporting HMDA data is to ensure that the documents from which they pull data are, in fact, the final versions.
Many times, errors in HMDA reporting are due to reporting data based on a non-final source document. The HMDA audit solution searches through an image archive for every version of each document relevant to the HMDA reporting process and automatically determines the final versions.
Data is then automatically captured from these final documents via AI data extraction rules and coalesced into an XML file or spreadsheet for reporting. This process provides lenders with a highly automated method to ensure the accuracy of required loan application Register reporting data and to ensure a database of record quality for future reporting needs. These technological adaptations are providing effective solutions to monotonous hurdles that plague the average workday filled with paperwork and data entry. With technology like OCR behind the wheel of progress, the industry leaders can spend less time plugging in data points and dedicate more focus to the minute-by-minute needs of its customers.