Thousands of live API keys and passwords found exposed in training data

On February 27, 2025, security researchers revealed that LLMs were trained on datasets containing approximately 12,000 live API keys and passwords.

February 28, 2025

The Nudge Security Team

SaaS Security Alerts

What Happened?

On February 27, 2025, security researchers at Truffle Security revealed that large language models (LLMs), including DeepSeek, were trained on datasets containing approximately 12,000 live API keys and passwords. Researchers scanned Common Crawl, a publicly available dataset widely used to train AI coding assistants, and discovered extensive hardcoded secrets across millions of web pages.

‍

Why This Issue Matters

AI models trained on insecure data risk inadvertently suggesting unsafe coding practices, such as embedding sensitive credentials directly in source code. The repeated exposure of live secrets in widely used training datasets significantly increases the risk of compromised API keys and passwords.

‍

How the Secrets Were Exposed

Websites inadvertently published live API keys, passwords, and sensitive credentials in front-end HTML/JavaScript.
Common Crawl dataset captured snapshots of these insecure web pages.
LLMs like DeepSeek subsequently trained on this publicly available dataset.

‍

Implications

Increased risk of credential misuse in phishing campaigns, data breaches, and brand impersonation.
Higher likelihood of insecure code recommendations from AI coding assistants.

‍

Recommended Actions

Review API and Password Management: Immediately audit and rotate exposed API keys and passwords.
Enhanced Secret Scanning: Extend scanning to cover public internet datasets such as Common Crawl and archive.org.
Educate Developers: Incorporate secure coding guidelines explicitly into AI coding assistant instructions.
Engage AI Providers: Advocate for stricter data alignment and additional safeguards in AI model training processes.

Debunking the "stupid user" myth
in security

Exploring the influence of employees’ perception
and emotions on security behaviors

Download the report

Watch the webinar

Thousands of live API keys and passwords found exposed in training data

The Nudge Security Team

What Happened?

Why This Issue Matters

How the Secrets Were Exposed

Implications

Recommended Actions

Related posts

Report

Debunking the "stupid user" myth
in security

Product

Resources

Company

Assurance

Use Cases

SaaS Security

SaaS Management

Third-Party Risk Management

Identity Governance

Thousands of live API keys and passwords found exposed in training data

The Nudge Security Team

What Happened?

Why This Issue Matters

How the Secrets Were Exposed

Implications

Recommended Actions

Related posts

SaaS identity and access management best practices

A deep dive on SaaS spend management

A SaaS security best practices guide

Report

Debunking the "stupid user" mythin security

Let’s stay in touch.

Product

Resources

Company

Assurance

Use Cases

SaaS Security

SaaS Management

Third-Party Risk Management

Identity Governance

Debunking the "stupid user" myth
in security