AI Data Classification | The solution to the Personal Information Regulation

March 28, 2024

min read

About Kriptos

Founded in 2018, Kriptos is a B2B SAAS cybersecurity startup that uses AI / Machine Learning to automatically classify an organization’s unstructured data to prevent losage or exfiltration. Trained with more than 30,000 regulatory compliance ISO27001 reviewed documents, Kriptos’ Artificial Intelligence has a strong baseline supervised learning dataset which can be further enhanced by company-specific dictionaries. As the first Latin American startup incubated by the Barclay’s accelerator, Kriptos currently has over 50 running Proof of Concepts (POCs) with companies such as AVIS and Banco Falabella establishing a strong presence in Latin America and aims to break into the US market. Harnessing new developments in AI and machine learning technology, Kriptos is well positioned to provide a solution to proactively prevent data breaches ahead of regulatory compliance initiatives such as CCPA and GDPR.

Cybersecurity Risks

Unclassified data and documents pose significant financial risk to companies including lost business and regulatory fines. According to the 2019 Cost of a Data Breach Report conducted by Ponemon Institute sponsored by IBM, the average cost of a data breach is 3.9 Million dollars with an average size of 25 thousand documents. Most surprisingly the average time to identify and contain a breach is 279 days. This shows the difficulty in understanding where the breach has stemmed from or employee access levels even with the knowledge that the breach has occurred. Through data loss prevention tools, an organization can proactively protect against phishing and exfiltration attacks through customizable data access controls and data loss protection (DLP) detection. However, companies often quickly find they do not have a solid foundation on which to operate their DLP and platforms as companies' current classification process may be incomplete, out of date, or lacking rigor and resources to consistently & fully classify their data.

Historical Data Classification Solutions

Historically, data classification has been done manually by employees or consultants. On average, an organization with 5,000 employees has more than 10 million files which at 60 seconds a document would take 19 years to classify manually. Not only is does this result in significant budget and time expenditure but manual classification exposes organizations to significant human error risks of information misclassification and misplacement. In contrast, Kriptos’ proprietary AI program is able to seamlessly integrate with an organization’s document repository, understand the content and context of each file, and classify them in less than a week. According to a new report by Grand View Research, the data classification market will grow to 2.4 Billion by 2025. As firms increase their budgets in response to increasing cyber security threats and regulation, automated data classification is the foundation of data security.

Regulatory Initiatives and Changing Landscape

Enhanced levels of document contextual understanding will be required across industries due to new regulatory initiatives. Recently becoming effective in 2020, the California Consumer Privacy Act (CCPA) aims to enhance privacy rights and consumer protection. The CCPA applies to any company that has annual gross revenues more than 25 million, receives the personal information of 50 thousand consumers, or earns more than half of its revenue from the sale of personal information of Californian residences. Under this new law, companies that become victims of “data theft or other data security breaches can be ordered in civil class action lawsuits to pay statutory damages between $100 to $750 per California resident and incident”. In addition to the per costumer fine, there is additionally “a fine up to $7,500 for each intentional violation and $2,500 for each unintentional violation”

Most interestingly is the CCPA definition of personal information. CCPA defines personal information as information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. With the potential for broad legal interpretation, this definition exceeds the traditional definition of direct personal information which includes data such as social security numbers, birth dates, and credit card numbers. Additionally, this definition could include such information as customer browsing or purchasing patterns. Since these indirect consumer personal information patterns are not in a set format, regular expression filtering would be unable to locate and categorize these items. Machine learning is a viable solution to this evolving definition of personal information and Kriptos’ proprietary AI technology is the solution to understand differences between direct and indirect personal information through contextual level understanding.

“Kriptos excels at classifying millions of cloud-based documents in an incredibly short period of time, speeding up the time to adoption for DLP technology” - Dave Rogers, Netskope’s VP of Alliances & Global Channel Sales

To find out more about Kriptos Data Classification solutions, please contact us to schedule a demo.

‍