Aggregation of data sources
Aggregation is a threat hunting technique that involves collecting and analyzing data from multiple sources to identify potential security threats
This technique is used to correlate data from different systems, applications and endpoints to identify patterns and anomalies that may indicate a security breach
Organizations with mature logging, detection and monitoring may already have this covered
Data aggregation techniques can be applied to a variety of data sources, including network traffic logs, system logs and user behavior data
Aggregation techniques can be particularly useful in identifying advanced persistent threats (APTs) and other sophisticated attacks that may be spread across multiple systems and networks
Overall, aggregation is a valuable threat hunting technique that can help organizations detect and respond to security threats more effectively
Identifying log sources
Modern distributed enterprise applications have many moving pieces, so you need to identify all the components you want to aggregate logs from
To keep logs manageable, you could choose to only capture certain types of events (such as failed login attempts or queries taking more time than a set threshold) or specific levels of importance
For example, you can choose to collect all failed connection attempts from your Network Intrusion Detection System (NIDs) while only collecting critical error messages about crashing pods from your Kubernetes cluster
Collecting logs
The next step after identifying log sources is to collect those logs. Log collection should be automatic
There are multiple ways to collect logs, which include the following:
Applications can use standard message logging protocols like Syslog to stream their logs continuously to a centralized system
You can install custom integrations and collectors (also known as agents) on servers that read logs from the local machine and send them to the logging platform
Code instrumentation captures messages from specific program parts, which often depends on the specific error conditions encountered
Log management systems can directly access source systems and copy log files over the network
Parsing logs
Parsing is the process of extracting key pieces of information from each logged event and putting them into a common format
These values are then stored for later analysis. Logs can be quite large and contain lots of useless data. Parsing extracts only the relevant pieces of data while discarding the rest
One example of parsing is mapping original timestamps to the values of a single time zone. Timestamps are critical metadata related to an event, and you can have different timestamps in your logs depending on your log sources
A parser can extract other important pieces of information, such as usernames, source and destination IP addresses, network protocols used and the message of the log
For example, parsing can also filter out data to keep only ERROR and WARNING type events, while excluding anything less severe
Processing logs
Indexing builds a map of the parsed and stored data based on a column, similar to a database index
Indexing makes querying logs easier and faster. Unique indexes also eliminate duplicate log data
Data enrichment can also be very helpful for gaining further insight from your logs. Some examples of data enrichment include:
Adding geolocation to your log data from IP addresses
Replacing HTP status codes with their actual messages
Including operating system and web browser details
Masking is when sensitive data like encryption keys, personal information or authentication tokens and credentials are redacted from logged messages
Storing logs
Most log management platforms compress the parsed, indexed and enriched logs before storing them
Compression reduces the network bandwidth and storage cost for logs. Typically, compression uses a proprietary format
When aggregating logs, you also need to set their retention policies
Retention policies dictate how long logs should be stored. This can depend on multiple factors such as storage space available, industry requirements or organizational policies
Additionally, different types of logs can have different retention requirements. After the specified time, old logs can be removed from the system or archived to less expensive storage with higher latency
Log removal and archival help you to improve query performance by reducing the size of data and are helpful in the auditing purposes
Log types
Here are some recommendations for types of logs that should be captured
System logs generated by Syslog or event log service
Web server logs
Application logs, including those from microservices
Network flow logs
Firewall, antivirus, intrusion detection system logs
Database logs
API Gateways logs
Load balancer logs
DNS logs
Authentication service logs
Proxy server logs
Configuration changelogs
Last updated