Do you have one of those multi-purpose photocopier / printer / scanners? I’m sure you will say that you have several, but how are they used? And do you understand the potential cyber risks they create? For many organizations, it’s become standard practice to scan a copy of anything hard that arrives into the business and store an electronic copy of it. Documents can be scanned, converted to a PDF and emailed automatically within seconds. When the document is scanned, a picture is created for each page and the pictures are stored in the PDF. From a DLP perspective, this document could be shared within the organization and outside of it, with relative freedom, as traditional DLP cannot discover the actual information; the text, inside the images, inside the PDF.
Optical Character Recognition (OCR) as a technology has been around for a long time. It inspects images for text and then decodes them. While the human eye is fantastic at recognizing text, be it upside down or at an angle, it is computationally intensive to do this by machine on a regular basis. However, new algorithms now exist to deal with skew (angled text) as well as being able to handle multiple languages.
The latest versions of Clearswift’s email product portfolio – the Secure Email Gateway (SEG), Secure Exchange Gateway (SXG) and ARgon for Email– have a new cost option for OCR to mitigate this risk. It supports multiple languages, enabling it to be easily used by global organizations who operate using more than one language. The use of language-specific dictionaries reduces the number of false positives and increases the recognition rate. The architecture behind Clearswift’s email security products has always been scalable, with multiple instances able to be peered together for both scalability and availability. The introduction of the OCR technology makes use of the scalability, with more instances being able to be added so that even though more processing is happening, the overall throughput of the system can be maintained. This is much more cost effective than requiring a ‘rip and replace’ of hardware, replacing the lower specification machines with more beefy ones.
Of course, OCR doesn’t just apply to images found in PDFs, it can scan all types of image formats, including those which are used with screenshots, or images embedded in other files such as Microsoft Office.
The introduction of new data protection regulations such as GDPR means organizations need to become more vigilant towards protecting critical information. This includes looking at new threat vectors and putting measures in place to mitigate them. The increased use of multi-function printers as scanners means the risk is increasing and so solutions need to be examined. The new OCR option from Clearswift addresses the problem in a cost-effective manner.