If you missed the first blog you can find part 1 here!
If you missed the second blog you can find part 2 here!
Data exfiltration is the final opportunity for defenders to stop an attack before damage becomes permanent. Traditional data loss prevention operates on a simple premise: block large transfers, monitor unusual protocols, alert on after-hours activity. Sophisticated adversaries understand these thresholds and design their exfiltration strategies specifically to stay underneath them.
When we implemented modern data staging techniques in our test environment, traditional DLP rules detected zero of the 90 exfiltration flows. This mirrors what sophisticated ransomware and APT groups actually do in production environments: stage data in small transfers, use business hours, blend with normal file server activity, and leverage legitimate protocols.
The evolution from obvious to sophisticated exfiltration
Unsophisticated data theft is easy to detect: connect to file server, download everything, transfer gigabytes in compressed archives during off-hours. Traditional DLP catches this immediately.
Sophisticated operators evolved beyond this. BlackCat (ALPHV) uses ExMatter, a custom tool that exfiltrates data using SFTP (port 22) and WebDAV (port 80). BlackCat also uses FreeFileSync, a legitimate file synchronization tool, to exfiltrate data weeks before ransomware deployment.
Huntress analysts observed operators installing WinRAR and FileZilla, using FileZilla's SFTP module for staged archives. Other methods include Rclone for cloud sync, MEGAsync for uploads, and backup utilities like Restic. The pattern: use legitimate tools, transfer incrementally, blend with normal operations.
Staging strategies: Small transfers over time
The threshold dilemma creates a fundamental problem. Organizations set transfer limits: flag any outbound transfer exceeding 100MB, or 1GB, or 10GB. Attackers stay below these limits. Transfers split into 50MB chunks hourly generate no alerts despite exfiltrating gigabytes over weeks.
Our test demonstrated this. We staged 2.1GB from a file server over 8 days. Each transfer: 200-500MB. Total: 90 flows to external infrastructure, all during business hours (7am-7pm), all using HTTPS/443. Traditional DLP detected zero because individual transfers fell below the 1GB threshold, occurred during normal business hours, used standard protocols (HTTPS/443), and went to AWS storage that cannot be blocked.
This mirrors documented incidents. BlackCat operators exfiltrated multiple terabytes gradually over weeks in a 2023 attack. APT groups like Evasive Panda use CloudScout to extract data from Gmail, Outlook, and Google Drive through stolen session cookies. Each extraction appears as normal cloud service access.
The threshold dilemma
DLP operates by setting thresholds: if transfer exceeds X, if rate exceeds Y, if volume per hour exceeds Z, then alert. Set thresholds too low and false positives overwhelm teams. Set too high and attackers slip underneath.
Organizations typically set outbound alerts at 1GB or higher to avoid false positives from legitimate file sharing, backups, and updates. Sophisticated attackers know this. They configure tools like Rclone with rate limits and chunk sizes designed to stay under detection. A 50MB/hour limit allows 1.2GB per day, 8.4GB per week, generating zero threshold alerts.
Ransomware operators maintain access for weeks, gradually exfiltrating data in increments that evade DLP. This dwell time serves dual purposes: reconnaissance and gradual data theft. By ransomware deployment, sensitive data has already been stolen.
Using normal protocols and business hours
When we implemented our test exfiltration, all transfers used HTTPS on port 443 to AWS S3 storage. This is indistinguishable from normal business operations. Organizations use cloud storage constantly: OneDrive, Dropbox, Google Drive, Box, AWS, Azure. Blocking port 443 or cloud provider IP ranges would break core business functions.
Attackers exploit this. Kaspersky researchers documented CloudSorcerer APT, which uses Microsoft, Yandex, and Dropbox cloud infrastructures for C2 and exfiltration. Traffic uses legitimate APIs with authentication tokens. Network monitoring sees authorized access to approved cloud services, not data theft.
The timing strategy compounds detection difficulty. Our test transferred data exclusively during business hours (7am-7pm local time). File servers experience peak activity during business hours. Users access documents, share files, sync to cloud storage. Exfiltration flows blend completely with this normal activity pattern.
APT28 (Fancy Bear) used SlimAgent malware to exfiltrate sensitive data from Ukrainian entities. The exfiltration occurred during normal working hours using encrypted channels that appeared as routine network traffic. Iranian APT groups executed widespread attacks with encrypted data exfiltration that evaded time-based and protocol-based detection rules.
File server access as perfect cover
File servers present ideal exfiltration conditions. They contain concentrated sensitive data. Authorized users access them constantly. Activity volumes fluctuate. Establishing baseline "normal" behavior is difficult because usage varies by project, department, and business cycle.
When attackers compromise credentials with file server access, staging activity blends with legitimate operations. An employee accessing 50 files from finance looks identical to a compromised account doing the same maliciously. Both generate identical access logs, network flows, file operations.
Our test exploited this. We used compromised credentials to access a file server, selecting files across multiple directories (finance, HR, engineering, customer data). The pattern appeared as normal work: document reviews, project research, report compilation. Transfers mimicked an employee backing up work files.
DLP struggles here. Blocking all external transfers prevents legitimate work. Restricting cloud storage disables approved tools. Detecting malicious transfers requires understanding intent, not just observing technical actions.
Why analyzing activity over time reveals staging patterns
DeepTempo's LogLM detected all 90 exfiltration flows. This is not because it measured transfer volumes, monitored protocols, or tracked timing windows. The foundation model learned structural signatures of data staging patterns.
Consider what data exfiltration looks like when observed over extended periods. A legitimate employee accessing files generates a specific pattern: sporadic access distributed across working hours, varied file types matching work responsibilities, access patterns that correlate with project timelines and business needs. An attacker staging data for exfiltration generates a different pattern: systematic file enumeration, broad access across sensitive directories, consistent transfer volumes to single destination, temporal structure optimized for comprehensive data theft rather than work tasks.
Individual flows appear normal when evaluated in isolation. The access happens during business hours. Protocols are standard HTTPS. Transfer sizes are reasonable. Destination is approved cloud storage. But the long-horizon behavioral pattern reveals the underlying intent. The pattern of systematic access across multiple sensitive directories, combined with consistent outbound transfers to the same external endpoint, creates a structural signature distinct from legitimate file server usage.
Attackers can configure transfer sizes to stay under thresholds. They can use business hours to blend with normal activity. They can leverage approved protocols and destinations. But they cannot make their activity sequences appear normal while accomplishing systematic data theft. The structural signature of staging and exfiltration remains detectable regardless of volume limits, timing windows, or protocol choices.
LogLM detection happens at the structural level, independent of the specific technical parameters attackers configure. Attackers can stay invisible through rotation, encryption, and low-volume activity, yet structural patterns remain detectable.
What this means for defenders
Sophisticated adversaries understand DLP thresholds and design exfiltration to evade them. Ransomware operators maintain access for weeks, gradually exfiltrating data in increments that generate no alerts. APT groups use legitimate tools like cloud sync utilities to blend malicious transfers with approved operations.
Rule-based DLP creates an arms race. Defenders set thresholds. Attackers stay underneath. Defenders lower thresholds. False positives increase. Teams cannot manually investigate every 200MB transfer during business hours without overwhelming capacity.
The 2024 Verizon DBIR reports 32% of breaches involve ransomware with data exfiltration before encryption. Sophos found 56% of ransomware victims now pay ransoms because data was stolen and threatened for release. Traditional DLP has not stopped this because sophisticated adversaries evade rule-based detection.
Foundation models that learn behavioral patterns provide an alternative. Instead of thresholds for individual parameters, the model learns what staging and exfiltration activity structurally looks like. This works against techniques defenders have not anticipated because it evaluates temporal patterns rather than matching predefined rules.
Why deep learning beats traditional ML anomaly detection becomes clear in scenarios like this. Anomaly detection measures deviation from baseline, but attackers deliberately operate within normal parameters. Deep learning foundation models understand structural signatures of malicious intent independent of baseline thresholds.
Our test demonstrates this. Traditional DLP detected nothing when exfiltration used small transfers, business hours, and standard protocols. LogLM detected everything because the structural pattern revealed systematic staging intent.
MITRE: Collection, Exfiltration
Related reading:
- From packets to patterns: How foundation models detect network threats
- From packets to patterns: Why your network sees attacks as normal traffic
- The cybersecurity transformation: Building security from first principles with foundation models
- Rules, rules everywhere: Why signature-based detection falls short against AI threats
- Why deep learning beats traditional ML anomaly detection for today's cyber defense
Get in touch to run a 30 day risk-free assessment in your environment. DeepTempo will analyze your existing data to identify threats that are active. Catch threats that your existing NDRs and SIEMs might be missing!