Chatbots like ChatGPT have now become common tools for many people. However, during usage, one should be cautious not to let their privacy become training material for AI or be exposed due to data leaks.
According to news on March 31, over the past few years, users have learned a lot from OpenAI’s ChatGPT, and the chatbot has also recorded much personal information about its users.
It has collected a large amount of personal data through countless user interactions—for example, who likes eating eggs, which users have babies who need to breastfeed to fall asleep, and how some users need to adjust their workout routines due to back pain.
It has even remembered more private and sensitive details that are not convenient to disclose.
No matter which chatbot is chosen: the more you share, the more useful it becomes. Patients upload blood test reports for analysis, engineers paste unpublished code for debugging. But AI experts warn that we should remain cautious when using such human-like tools, especially with sensitive information like social security numbers or corporate confidential data.
Although tech companies are eager to use data to optimize their models, they don’t want to obtain users’ private data. OpenAI clearly warns: “Do not disclose any sensitive information in conversations.” Google also emphasizes to Gemini users: “Avoid inputting confidential information or anything you wouldn’t want a reviewer to see.”
AI researchers believe that chat records about strange rashes or financial mistakes could become training material for new versions of AI or could be exposed through data breaches. Below are types of content users should avoid inputting, along with some tips on how to protect your privacy in conversations:
The realistic, human-like communication style of chatbots often causes people to lower their guard. Jennifer King, a researcher at Stanford’s Institute for Human-Centered AI, warns that once you enter information into a chatbot, “you lose control over it.”
In March 2023, a ChatGPT bug allowed some users to see the initial conversation content of others. Fortunately, the company quickly fixed the issue. OpenAI had also previously mis-sent subscription confirmation emails, exposing user names, email addresses, and payment information.
If there’s a data breach or a legal subpoena, chat records could be included in the exposed data. It’s recommended to set strong passwords and enable multi-factor authentication, and avoid inputting the following types of information:
Identity information: Including Social Security numbers, driver’s license numbers, passport numbers, dates of birth, addresses, and phone numbers. Some chatbots have automatic data-masking features that can block sensitive fields. An OpenAI spokesperson stated: “We aim to train AI models on the world, not individuals’ private data, and actively reduce personal data collection.”
Medical test reports: Medical confidentiality is meant to prevent discrimination and embarrassment, but chatbots are generally not protected under special health data regulations. If AI assistance is needed to interpret test results, Stanford’s King recommends: “First crop or edit the images or documents to only retain test results, and mask all other information.”
Financial account information: Bank and investment account numbers could be exploited for monitoring or theft and must be strictly protected.
Corporate confidential information: If users enjoy using general-purpose chatbots for work—even just drafting a basic email—it could unintentionally leak client data or trade secrets. Samsung once banned ChatGPT entirely after engineers leaked internal source code. If AI is genuinely helpful at work, companies should use commercial versions or deploy customized AI systems.
Login credentials: As intelligent agents capable of performing real-world tasks become more common, the need to provide account credentials to chatbots increases. But these services are not built to digital vault standards—passwords, PIN codes, and security questions should be stored in professional password managers.
When users give positive or negative feedback on chatbot responses, it may be considered permission to use the questions and AI replies for evaluation and model training. If a conversation involves sensitive topics like violence and gets flagged, it might even be manually reviewed by company staff.
Anthropic, the developer of Claude, does not use user conversations to train its AI by default and deletes data after two years. OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini all use conversation data but offer options in settings to turn this off. For privacy-focused users, here are some suggestions:
Regularly delete records: Jason Clinton, Chief Information Security Officer at Anthropic, recommends that cautious users clear their conversations regularly. AI companies usually delete data marked as “deleted” after 30 days.
Enable temporary conversations: ChatGPT’s “temporary chat” feature is similar to incognito mode in browsers. When enabled, it prevents data from being saved to the user’s profile. These chats are not stored in history nor used to train models.
Ask questions anonymously: Privacy-focused search engines support anonymous access to mainstream AI models like Claude and GPT, and promise that such data won’t be used for training. Though advanced features like file analysis may not be available, basic Q&A functions work well.
Remember, chatbots are always happy to keep the conversation going, but deciding when to end it—or hit the delete button—is always up to the user.
Related:
Disclaimer:
- This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
- This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
- Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.