An AI generated image of a person standing in a doorway with a keyhole shadow

A guide for companies that want to leverage AI without compromising data security 

Introduction 

Artificial intelligence (AI) tools can help companies improve efficiency, accuracy, innovation, and customer satisfaction. However, using AI also comes with some challenges and risks, especially when it involves sensitive or personal data, also referred to as PII (Personally Identifiable Information). Data breaches, cyberattacks, privacy violations, and ethical issues are some of the potential threats that companies should be aware of when and if they mitigate to use AI tools. 

This document aims to provide some guidance and best practices for companies that want to protect their data when they use AI tools. It will cover the following topics: 

  • Why data protection is important for AI 
  • What are the main data protection challenges and risks for AI 
  • What are the key data protection principles and standards for AI 
  • What are some practical data protection strategies and solutions for AI 

Why Data Protection is Important for AI 

Data is the fuel for AI. Without data, AI tools cannot learn, train, or perform their tasks. Data is also the output of AI. AI tools can generate, analyze, or process data to provide insights, recommendations, or decisions. Therefore, data protection is crucial for AI, both as an input and an output. 

Data protection is important for AI for several reasons: 

  • It ensures the quality and reliability of the data and the AI tools. Data protection can prevent data corruption, manipulation, or loss, which can affect the accuracy, validity, or performance of the AI tools. Data protection can also ensure the integrity, consistency, and completeness of the data and the AI tools. 
  • It safeguards the rights and interests of the data subjects and the data owners. Data protection can prevent unauthorized access, use, or disclosure of the data, which can violate the privacy, confidentiality, or consent of the data subjects or the data owners. Data protection can also protect the intellectual property, trade secrets, or competitive advantage of the data owners. 
  • It complies with the legal and ethical obligations and expectations of the data users and the data regulators. Data protection can ensure that the data and the AI tools follow the relevant laws, regulations, standards, or guidelines that govern data collection, processing, storage, or sharing. Data protection can also align with the ethical principles, values, or norms that guide data governance, accountability, or transparency. 

What are the Main Data Protection Challenges and Risks for AI 

Because of its features, abilities, and uses, AI has some distinct and complicated data protection issues and dangers. Some of the main data protection challenges and risks for AI are: 

  • Data volume and variety. AI tools often require large and diverse datasets to learn, train, or perform their tasks. This can increase the complexity and difficulty of data protection, as the data may come from various sources, formats, or domains, and may contain distinct types of information, such as personal, sensitive, or confidential data. 
  • Data processing and sharing. AI tools often involve complex and dynamic data processing and sharing activities, such as data extraction, transformation, integration, analysis, or dissemination. This can increase the exposure and vulnerability of the data, as the data may be transferred, stored, or accessed by different parties, platforms, or systems, and may be subject to different policies, protocols, or standards. 
  • Data interpretation and application. AI tools often generate, analyze, or process data to provide insights, recommendations, or decisions, which may have significant impacts or consequences for the data subjects, the data owners, or the data users. This can increase the responsibility and accountability of the data protection, as the data may affect the rights, interests, or obligations of the data subjects, the data owners, or the data users, and may raise ethical, legal, or social issues. 

What are the Key Data Protection Principles and Standards for AI 

Data protection for AI should follow some key principles and standards, which can provide a framework and a benchmark for data protection practices and policies. Some of the key data protection principles and standards for AI are: 

  • Data security. Data protection for AI should implement measures to protect the data from unauthorized or unlawful access, use, disclosure, alteration, or destruction. Data protection for AI should also monitor and report any data breaches or incidents and take remedial actions as soon as possible.  
  • Data minimization. Data protection for AI should collect, process, store, or share only the minimum amount and type of data that is necessary, relevant, and adequate for the purpose and scope of the AI tools. Data protection for AI should also delete or anonymize the data when it is no longer needed, outdated, or more appropriately not accurate. 
  • Data privacy. Data protection for AI should respect and uphold the privacy rights and preferences of the data subjects, and obtain their informed and explicit consent before collecting, processing, storing, or sharing their data. Data protection for AI should also provide the data subjects with the options to access, correct, delete their data, or to withdraw their consent at any time. 
  • Data transparency. Data protection for AI should disclose and explain the sources, methods, purposes, and outcomes of the data and the AI tools, and provide clear and accurate information and communication to the data subjects, the data owners, and the data users. Data protection for AI should also enable and facilitate the oversight, audit, or review of the data and the AI tools, and address any questions, concerns, or complaints. 
  • Data accountability. Data protection for AI should assign and assume the roles, responsibilities, and liabilities of the data and the AI tools, and ensure that the data and the AI tools comply with the relevant laws, regulations, standards, or guidelines. Data protection for AI should also evaluate and assess the impacts and risks of the data and the AI tools, and implement measures to prevent, mitigate, or remedy any harm or damage. 

What are some Practical Data Protection Strategies and Solutions for AI 

Data protection for AI requires some practical strategies and solutions, which can help implement and operationalize the data protection principles and standards. Some of the practical data protection strategies and solutions for AI are: 

  • Data encryption. Data encryption is a technique that converts data into a code that can only be accessed or decrypted by authorized parties. Data encryption can enhance data security and privacy and help prevent unauthorized or unlawful access, use, disclosure, alteration, or destruction of the data. 
  • Data anonymization. Data anonymization is a technique that removes or modifies the data that can identify or link to a specific data subject, such as names, addresses, or phone numbers. Data anonymization can reduce the data volume and variety, by minimizing the amount and type of data that is collected, processed, stored, or shared. 
  • Data federation. Data federation is a technique that allows the data to remain in its original location and format, and only provides a virtual view or access to the data when it is needed or requested by the AI tools. Data federation can improve data minimization and transparency, by collecting, processing, storing, or sharing only the necessary, relevant, and adequate data for the purpose and scope of the AI tools. 
  • Data auditing. Data auditing is a technique that records and tracks the data and the AI tools activities, such as data collection, processing, storage, or sharing, and data generation, analysis, or processing. Data auditing can support data accountability and oversight, by providing evidence and documentation of the data and the AI tools compliance, impacts, and risks. 
  • Data ethics. Data ethics is a technique that applies ethical principles, values, or norms to the data and the AI tools, such as fairness, justice, or respect. Data ethics can address the ethical, legal, or social issues that may arise from the data and the AI tools interpretation and application and ensure that the data and the AI tools respect and uphold the rights and interests of the data subjects, the data owners, and the data users. 
An AI generated image of a policy manual

I. Purpose     

Artificial intelligence (AI) is a branch of computer science concerned with developing software that allows computer systems to perform tasks that imitate human cognitive intelligence. AI includes Generative AI, Natural Language Processing (NLP), and Large language models (LLM).

This AI policy is of utmost importance as it establishes guidelines and best practices for the responsible and ethical use of AI at [Company Name]. It ensures that [Company Name] employees use publicly available AI systems in a manner that aligns with [Company Name]’s values and complies with legal and regulatory standards.

This policy applies to all [Company Name] employees, contractors, and third-party individuals who have access to AI technologies or are involved in using AI systems on behalf of [Company Name].

II. Generative AI

Generative AI uses generative models to create new data—such as text, images, videos, or other content. These models learn patterns and structures from their input training data and then generate novel data with similar characteristics. In recent years, improvements in transformer-based deep neural networks, especially LLMs, have led to a boom in Generative AI systems. Some notable examples include:

  1. Chatbots: ChatGPT, Copilot, Perplexity, Gemini, and LLaMA. 
  2. Text-to-image AI: Systems like Stable Diffusion, Midjourney, and DALL-E. 
  3. Text-to-video AI: Sora. 

As part of our commitment to responsible AI use, [Company Name] is experimenting with Microsoft Copilot, an AI-powered productivity tool. It integrates with Microsoft 365 Apps, such as Word, Excel, PowerPoint, Outlook, and Teams, and combines the power of LLMs with files provided by SI, like a private cloud. 

[Company Name] is still reviewing the impact of publicly available Generative AI systems on its business operations and will update this policy as necessary as it reviews other new Generative AI systems. 

II. Use of Generative AI                                 

Before any employee utilizes a publicly available Generative AI system, the employee must verify that such use conforms to this policy.  [Company Name] employees cannot use Generative AI for any client work unless the employee confirms that the client has consented.                         

Generative AI may be used for R&D, testing, and non-firm-specific information (such as developing generic HR, accounting, or IT procedures). 

[Company Name] employees using a publicly available Generative AI system are required to:

  • Carefully review AI-generated content for accuracy. Generative AI systems are known to “hallucinate” false answers or information or provide information that is stale.
  • Treat every bit of information inputted into a Generative AI system as if it will go viral on the internet and be attributed to you or SI, regardless of the settings you have selected within the system (or the assurances made by its creators).
  • Inform your manager or other appropriate [Company Name] employee when work is created using Generative AI. Do not represent work that is AI-generated as being your own original work.

III. Prohibition on Entering Client Information into Generative AI                    

The following prohibitions apply to the use of any Generative AI system: 

  • Do not upload or input [Company Name]’s name into any Generative AI system or any information that could identify [Company Name].
  •  Do not upload or input confidential, proprietary, or sensitive [Company Name] information into any Generative AI system. Examples include passwords and other credentials, personnel material, information from documents marked Confidential, Sensitive, or Proprietary, or any other nonpublic [Company Name] information that might be useful to competitors or harmful to the [Company Name] if disclosed. This may breach [Company Name]’s obligations to keep certain information confidential and secure, risks widespread disclosure, and may cause [Company Name]’s rights to that information to be challenged.
  • Do not upload or input a client’s name, any information that could identify the client or any confidential or proprietary information into any Generative AI system. 
  • Do not upload or input information from a client or third party protected by a confidentiality agreement or court order into any Generative AI system. 
  • Do not upload or input personally identifiable information (“PII”) that directly identifies any individual, such as names, addresses, Social Security numbers, telephone numbers, e-mail addresses, likenesses, etc.

IV. Violations

Violating this policy may result in disciplinary action, up to and including immediate termination, and could result in legal action. If you are concerned that someone has violated this policy, report this behavior to any manager, executive officer, or any member of Human Resources.

V. Disclaimer

Nothing in this policy is designed or intended to interfere with, restrain, or prevent employee communications regarding wages, hours, or other terms and conditions of employment or any other rights protected by the National Labor Relations Act.