Logging and Telemetry: What to Keep and Why
Logging programs fail in two directions: too little to investigate anything, or too much to afford. The middle path is intentional.

Logging programs often find themselves at a critical juncture, failing in one of two distinct directions: either they collect too little data to facilitate effective investigation, or they amass so much information that the costs become prohibitive. The optimal solution lies not in extremes, but in a deliberate and intentional approach to data retention and storage. This article outlines a strategic framework for managing logging and telemetry, emphasizing critical data points, areas ripe for optimization, and intelligent retention strategies.
The Non-Negotiables for Incident Response
Regardless of an organization's size, industry, or cybersecurity budget, certain data types are absolutely indispensable for effective incident response and forensic analysis. These non-negotiable logs must be retained for a minimum of 90 days in an easily accessible format. Their absence during a critical event will severely hamper investigative efforts and lead to significantly higher costs if external forensic expertise is required to reconstruct events from insufficient data.
- Identity Provider (IdP) Sign-ins and Administrator Actions: This category includes all authentication attempts, successful logins, failed login attempts, password resets, multi-factor authentication (MFA) events, and any administrative changes made within the IdP. These logs are paramount for tracking compromised credentials, detecting lateral movement, and understanding an adversary's initial access vectors.
- Cloud Control-Plane Activity: For organizations operating in cloud environments, logs from services such as AWS CloudTrail, Azure Activity Logs, and Google Cloud Platform (GCP) Audit Logs are fundamental. These record API calls, resource modifications, and configuration changes within the cloud infrastructure, providing an audit trail of actions taken by users, roles, or services. They are crucial for identifying unauthorized resource provisioning, privilege escalation attempts, and data exfiltration from cloud assets.
- Endpoint Detection and Response (EDR) Telemetry: EDR solutions collect a rich array of data from endpoints, including process execution, network connections, file system activity, and registry modifications. Retaining this telemetry offers deep visibility into endpoint behavior, which is vital for identifying malware, detecting post-exploitation activities, and understanding the scope of a breach on individual devices.
- Email Gateway Transaction Logs: Email remains a primary vector for initial access attempts, phishing, and malware delivery. Transaction logs from email gateways document sender/recipient information, subject lines, attachments, mail flow rules, and verdicts (e.g., spam, clean, quarantined). These logs are essential for tracing spear-phishing campaigns, identifying compromised accounts, and understanding communication patterns used by threat actors.
- VPN, Zero Trust Network Access (ZTNA), and SaaS Application Access Logs: As organizations increasingly rely on remote access and SaaS applications, logs from these services provide critical insights into user access patterns, unauthorized access attempts, and potential insider threats. VPN logs record connection times and IP addresses, while ZTNA and SaaS application logs detail specific resource access and user actions within those applications, forming a comprehensive picture of user activity across the distributed enterprise.
Without these foundational data sets, an organization will find its forensic investigations are more akin to guesswork, ultimately increasing response time, resolution costs, and potential regulatory fines.
What You Can Sample or Drop Without Significant Risk
While comprehensive logging feels intuitive, not all data holds equal evidentiary or analytical value. Organizations can achieve significant cost savings and improve the signal-to-noise ratio in their security operations by strategically sampling or even dropping certain types of verbose or redundant logs.
- Verbose Application Logs in Production: Application logs can be exceptionally chatty, especially in debug mode. In production environments, it is prudent to retain a focus on error and warning logs, as these directly indicate system issues or potential malicious activity. Informational logs can often be sampled, perhaps keeping 1 in 10 or 1 in 100, to provide context without overwhelming storage. Debug-level logs should generally be dropped in production unless actively troubleshooting a specific issue, as their volume and detail are rarely needed for security investigations and carry significant storage overhead.
- Health-Check Noise at the Load Balancer: Load balancers constantly perform health checks on backend services, generating a high volume of repetitive logs indicating successful service availability. While important for operational monitoring, these logs rarely contain security-relevant information during an incident. Filtering or dropping successful health check logs can substantially reduce ingestion volumes without compromising security posture.
- Successful DNS Queries for Low-Value Domains: DNS logs are invaluable for identifying command and control (C2) communication and malicious domain resolution. However, successful queries for well-known, high-reputation domains (e.g., common CDNs, operating system update servers, widely used SaaS platforms) can generate immense volume. Organizations can safely filter out successful queries to whitelisted, low-risk domains, focusing their retention efforts on failed DNS queries, queries to newly observed domains, or queries to domains associated with known threat intelligence.
Implementing Tiered Retention for Cost-Efficiency
An intelligent logging strategy extends beyond what to collect; it also dictates how long to keep it and in what storage tier. Most Security Information and Event Management (SIEM) systems and cloud logging services levy a premium for "hot" storage, data instantly available for searching and analysis. Implementing a tiered retention strategy is critical for balancing investigative needs with budgetary constraints.
- Hot Storage (0-90 Days): This tier is for immediate access and real-time analysis. All non-negotiable logs, along with strategically chosen verbose logs, should reside here. This period aligns with the typical timeframe for active incident response, threat hunting, and operational security monitoring. Optimizing this tier by carefully selecting what gets ingested and for how long directly impacts recurring costs.
- Cold Storage (90 Days - 12 Months): After the initial hot retention period, logs can be transitioned to a less expensive "cold" storage tier. This data is still searchable and retrievable, but with a longer delay, minutes to hours, and at a significantly reduced cost. This tier is suitable for periodic compliance audits, longer-term forensic investigations, and historical trend analysis where immediate access is not paramount.
- Archival Storage (Beyond 12 Months to Regulatory Requirement): For logs that must be retained for extended periods due to compliance mandates (e.g., HIPAA, PCI DSS, GDPR, SOX), ultra-low-cost archival storage is the appropriate solution. Accessing data from this tier might take hours or even days, but the cost per gigabyte is minimal. Organizations must understand their specific regulatory obligations to define the exact retention periods for this tier.
Right-sizing the hot storage tier is perhaps the most impactful action an organization can take to manage SIEM and logging costs effectively. By intentionally deciding what to keep, what to sample, and how to tier retention, organizations can build a robust logging infrastructure that supports strong security posture without financial strain.