Data Classification – What It Is, Why You Should Care, and How to Perform It
Organizations have limited resources to invest in safeguarding their data. Knowing exactly what data needs protection will help you set priorities and develop a sound plan so you can allocate your budget and other resources wisely, minimizing security and compliance costs. But where’s the best place to start? Data classification provides a solid foundation for a data security strategy because it helps identify risky areas in the IT network, both on premises and in the cloud.
Data classification definition
Data classification is the process of organizing data by agreed-on categories. Thoroughly planned classification enables more efficient use and protection of critical data across the organization and contributes to risk management, legal discovery and compliance processes.
For years, data classification was purely a user-driven process, but today organizations have options for automating classification. For new data that users create, organizations can establish processes that enable the users to classify the documents they create, send, modify or otherwise touch. If they want, they can leave the older data to gradually be retired without being classified. Alternatively, organizations can classify their backlog of existing data, using data discovery.
Data discovery is the process of scanning data repositories and reporting on the findings. Data discovery can serve many purposes, such as enterprise content search, data governance, and data analysis and visualization. But when combined with data classification, it becomes the process of identifying resources that might contain sensitive information, so you can make informed decisions about how to properly protect that data.
Benefits of data classification
Data classification helps you improve both data security and regulatory compliance:
- Security of critical data — To safeguard sensitive corporate and customer data adequately, first of all, you must know and understand your data. Specifically, you should be able to answer the following questions:
- What sensitive data do you have (IP, PHI, PII, card data, etc.)?
- Where does this sensitive data reside?
- Who can access, modify and delete it?
- How will it affect your business if this data is leaked, destroyed or improperly altered?
Having answers to these questions, along with information about the threat landscape, enables you to protect sensitive data by assessing risk levels, prioritizing your efforts, and planning and implementing appropriate data protection and threat detection measures.
- Compliance with regulatory mandates — Compliance standards require organizations to protect specific data, such as cardholder information (PCI DSS), health records (HIPAA), financial data (SOX) or EU residents’ personal data (GDPR). Data discovery and classification help you determine where these types of data are located, and make sure that appropriate security controls are in place and that the data is trackable and searchable, as required by regulations. By focusing your compliance efforts on data that falls under the regulations you’re subject to, you increase your chances of passing audits and maintaining day-to-day compliance.
Guidelines for data classification
There is no one-size-fits-all approach to data classification. However, the classification process can be broken down into four key steps, which you can tailor to meet your organization’s unique needs as you develop your data protection strategy.
Step#1. Establish a data classification policy. First, you should define a data classification policy and communicate it to all employees who work with sensitive data. The policy should be short and simple and include the following basic elements:
- Objectives– The reasons data classification has been put into place and the goals the company expects to achieve from it
- Workflows– How the data classification process will be organized and how it will impact employees who use different categories of sensitive data
- Data classification scheme– The categories that the data will be classified into
- Data owners– The roles and responsibilities of the business units, including how they should classify sensitive data and grant access to it
- Handling instructions– Security standards that specify appropriate handling practices for each category of data, such as how it must be stored, what access rights should be assigned, how it can be shared, when it must be encrypted, and retention terms and processes. Since these guidelines may change, it is best to maintain them as a separate document.
Step #2. Discover sensitive data. Once the policy is established, it’s time to decide whether you need data discovery. If you choose to classify only new data, some business-critical or sensitive data that you already have might be left insufficiently protected. If that risk is unacceptable, you need to invest money, time and effort to run data discovery and apply your classification policies to your existing data.
You can automate the data discovery using applications designed to identify systems and resources, such as databases or file shares, that might contain sensitive information. Some tools even report both the volume and potential category of the data.
Step #3. Apply labels. As an optional step, you can give each sensitive data asset a label in order to improve data classification policy enforcement. Labeling can be automated in accordance with your data classification scheme or done manually by data owners.
Step #4. Use the results to improve security and compliance. Once you know what sensitive data you have and its storage locations, you can review your security policies and procedures to assess whether all data is protected by risk-appropriate measures. By categorizing all your sensitive data, you can prioritize your efforts, control costs and improve data management processes.
Step #5. Repeat. Data is dynamic: Files are created, copied, moved and deleted every day. Therefore, data classification should be an ongoing process in the organization. Proper administration of the data classification process will help ensure that all sensitive data is protected.
Examples of data classification categories
There is no one “right” way to design your data classification model and define your data categories. For instance, U.S. government agencies often define three types of data: Public, Secret and Top Secret. NATO used a five-level scheme for the Manhattan Project. One option is to begin with a simple three-level type of data classification:
- Public data — May be freely disclosed with public (e.g., customer service contacts)
- Internal data — Has low security requirements but is not meant for public disclosure (e.g., organizational charts)
- Restricted data — Highly sensitive internal data whose disclosure could negatively affect operations and put the organization at financial or legal risk (e.g., customer, patient, and employee personal information; authentication data such as logins and passwords).
Your organization can use these three categories to define an initial data classification model and later on add more granular levels based on data content (PII, PHI, etc.), relevance to compliance standards or business specifics, and other criteria.
As you can see, data classification is not a magic wand that secures data or ensures compliance with regulatory requirements by itself. Rather, it helps organizations improve their security posture by focusing their attention, workforce and financial resources on the data most critical to the business. Once you have prioritized your risks, you better understand how to ensure appropriate data protection and ongoing compliance with security policies and regulations.