One year of ChatGPT: Taking into account data protection regulations when using large language models

12/08/2023
Reading time 7 Minutes

One year has passed since the introduction of ChatGPT. A European regulation of AI is yet to be enacted. Nevertheless, AI is already being used in many companies, for example, in the form of large language models (LLM). The best-known LLM is ChatGPT. Quite often, LLMs are used in companies without any guidelines for their use, although these guidelines are crucial for a legally compliant use of LLM. Data protection issues should not be forgotten.

When applying LLM, the use of personal data is relevant in the following phases:

When collecting and using training data,
when providing and using AI, and
when using AI results.

Entries in a cloud-based LLM are usually also used for the system’s training. Training an AI is a process during which an untrained model is fed with training data in order to teach it to recognize patterns and correlations in the data and make predictions. This may involve the transfer and use of personal data which can jeopardize the data’s confidentiality and involve the risk of such personal data’s unauthorized processing.

What specific rules must be observed by the companies?

Some of the Data Protection and Information Security Commissioners of the German federal states (“LfDI”) have developed discussion papers and checklists in order to provide companies with an overview of the requirements the company itself and its employees should observe when using AI.

For example, the Hamburg LfDI published a checklist including 15 aspects on the use of LLM-based chatbots in November 2023. These are not binding but may provide assistance. The checklist includes, among other things

	Specification of compliance regulations Companies should provide their employees with specific tools and programs for a specific purpose and only allow their use. This prevents unauthorized and uncontrolled use, thus enabling the employer to avoid a possible liability for its employees’ actions. It is also advisable to train the employees on such tools and programs’ correct use.
	Provision of a functional account Employees should not be allowed to create an account for business-related use independently and by using private data (such as email address or telephone number). There is a risk that the LLM provider will create a profile for the respective employees with their private data. If use in a professional context is desired, professional chatbot accounts should be made available.
	Involving data protection officers and avoiding the entry of personal data For a comprehensive check of the LLM, companies should involve their data protection officer(s) before using the tool and for the development of guidelines. In a first step, such data protection officer should check whether or not personal data is being processed at all and whether data protection law applies. Therefore, departments and management must question for what purposes the LLM is to be used. If these purposes have been defined and if it is clear that personal data can or will be processed when using the LLM, companies must always examine whether the use of personal data in the LLM is justified. Even if there is no targeted processing of personal data, the risk of disclosing personal data must always be examined and taken into account accordingly. A use of personal data is regularly not justified when a chatbot’s provider requests the users’ consent to the data’s use for its own purposes in its terms and conditions. In these cases, personal data must not be transferred to the LLM. Furthermore, companies should prohibit entries that could possibly be related to specific persons. It may also be possible to draw conclusions about persons in the company or third parties from the context, for example, if an employee enters a prompt referring to employees with clear characteristics or from specific departments of the company. Such risk is particularly high for LLM applications which are designed to create cross-references even from unstructured data. Finally, it should be noted that a data protection impact assessment may need to be prepared when using LLM.
	Checking the results for correctness An important point when using LLMs is checking the correctness of the produced responses. LLMs generate texts that approximate the desired result with mathematical probability. However, this does not mean that the result always corresponds to true facts. On the contrary, it regularly happens that the LLM’s underlying AI “invents” information and facts. If the prompt’s result contains inaccurate personal data which are subsequently used by the company, this would constitute an unauthorized processing of personal data, which may result in the sanctions, including fines, provided for in the GDPR.
	No automated final decision Exclusively automated decisions relating to individuals are generally not permitted. An automated decision exists if it is made “without any human intervention”. An example of an automated decision is when a contract is concluded or rejected solely on the basis of the assigned scoring value. In light of the above, decisions having a legal affect for the data subject should generally only be made by humans. Otherwise, the requirements pursuant to Art. 22 GDPR must be observed. If an LLM-based chatbot generates suggestions for employees, the companies must ensure that the employees always have actual decision-making scope in their final decision. The employees must not be bound by the LLM’s suggestions. The application and decision-making should be structured and documented in a transparent manner.
	Opt-out of AI training & the chat history As far as possible, companies should use, in relation to the LLM provider, the option of rejecting the use of the entered data for training purposes. This minimizes the risk that other companies, employees and other third parties may request this information – resulting in the unwanted disclosure of personal and company data. At the same time, it is advisable to deactivate the saving of previous entries. Deactivating the chat history is particularly recommended for shared use by several employees, as content is otherwise accessible by all colleagues. With ChatGPT, the opt-out is currently possible via the settings under → Setting → Data Controls → Chat history and training.
	Authentication Companies should also place a particular focus on the employee accounts’ authentication. LLM accounts used for business purposes offer considerable potential for abuse. If attackers gain unauthorized access to the application interface, they may be able to view previous activities. Attackers can also use their own queries to obtain personal information from employees or third parties. Against this background, strong passwords and the integration of additional authentication factors are an integral part of protecting accounts and company data.

In its discussion paper, also from November 2023, the LfDI Baden-Württemberg summarized the key questions companies should ask themselves when using AI as follows:

Which phase of data processing in connection with AI is subject to legal assessment?
Is personal data processed within the scope of the GDPR? Or is anonymous data processed that may become personal data?
Who is the data controller?
Is there a legal basis in data protection law? Are special categories of personal data processed that require a legal basis pursuant to Art. 9 (2) GDPR?
Are the other data protection-related obligations observed: for example, the GDPR’s principles (Art. 5 GDPR), compliance with data subjects’ rights (Art. 12 et seq. GDPR), the implementation of technical and organizational measures and safeguards (Art. 24 et seq. and Art. 89 (1) GDPR), and, if applicable, the preparation of a data protection impact assessment (Art. 35 GDPR).

In light of the various questions, companies should consider in detail in advance which LLM tools their employees are allowed to use for which purposes and check whether and which personal data may be affected. Only on this basis, companies can perform a data protection review of the LLM’s use and define measures in order to protect personal data, such as opting out of AI training and creating a policy on the use of LLM.

The data protection supervisory authorities are currently reviewing the legality of the language models on the market. Against this backdrop, current regulatory developments should always be kept in mind.

One year of ChatGPT: Taking into account data protection regulations when using large language models

Draft bills on Germany’s infrastructure fund: what matters after this strong signal

How far can the rights of a criminal defense insurer extend?

“Best Lawyers”: 22 Baker Tilly experts honored