Aggregation of data sources

Aggregation is a threat hunting technique that involves collecting and analyzing data from multiple sources to identify potential security threats

This technique is used to correlate data from different systems, applications and endpoints to identify patterns and anomalies that may indicate a security breach

Organizations with mature logging, detection and monitoring may already have this covered

Data aggregation techniques can be applied to a variety of data sources, including network traffic logs, system logs and user behavior data

Aggregation techniques can be particularly useful in identifying advanced persistent threats (APTs) and other sophisticated attacks that may be spread across multiple systems and networks

Overall, aggregation is a valuable threat hunting technique that can help organizations detect and respond to security threats more effectively

Identifying log sources

Modern distributed enterprise applications have many moving pieces, so you need to identify all the components you want to aggregate logs from

To keep logs manageable, you could choose to only capture certain types of events (such as failed login attempts or queries taking more time than a set threshold) or specific levels of importance

For example, you can choose to collect all failed connection attempts from your Network Intrusion Detection System (NIDs) while only collecting critical error messages about crashing pods from your Kubernetes cluster

Collecting logs

The next step after identifying log sources is to collect those logs. Log collection should be automatic

There are multiple ways to collect logs, which include the following:

  • Applications can use standard message logging protocols like Syslog to stream their logs continuously to a centralized system

  • You can install custom integrations and collectors (also known as agents) on servers that read logs from the local machine and send them to the logging platform

  • Code instrumentation captures messages from specific program parts, which often depends on the specific error conditions encountered

  • Log management systems can directly access source systems and copy log files over the network

Parsing logs

Parsing is the process of extracting key pieces of information from each logged event and putting them into a common format

  • These values are then stored for later analysis. Logs can be quite large and contain lots of useless data. Parsing extracts only the relevant pieces of data while discarding the rest

  • One example of parsing is mapping original timestamps to the values of a single time zone. Timestamps are critical metadata related to an event, and you can have different timestamps in your logs depending on your log sources

A parser can extract other important pieces of information, such as usernames, source and destination IP addresses, network protocols used and the message of the log

  • For example, parsing can also filter out data to keep only ERROR and WARNING type events, while excluding anything less severe

Processing logs

Indexing builds a map of the parsed and stored data based on a column, similar to a database index

  • Indexing makes querying logs easier and faster. Unique indexes also eliminate duplicate log data

Data enrichment can also be very helpful for gaining further insight from your logs. Some examples of data enrichment include:

  • Adding geolocation to your log data from IP addresses

  • Replacing HTP status codes with their actual messages

  • Including operating system and web browser details

Masking is when sensitive data like encryption keys, personal information or authentication tokens and credentials are redacted from logged messages

Storing logs

Most log management platforms compress the parsed, indexed and enriched logs before storing them

  • Compression reduces the network bandwidth and storage cost for logs. Typically, compression uses a proprietary format

When aggregating logs, you also need to set their retention policies

  • Retention policies dictate how long logs should be stored. This can depend on multiple factors such as storage space available, industry requirements or organizational policies

Additionally, different types of logs can have different retention requirements. After the specified time, old logs can be removed from the system or archived to less expensive storage with higher latency

  • Log removal and archival help you to improve query performance by reducing the size of data and are helpful in the auditing purposes

Log types

Here are some recommendations for types of logs that should be captured

  • System logs generated by Syslog or event log service

  • Web server logs

  • Application logs, including those from microservices

  • Network flow logs

  • Firewall, antivirus, intrusion detection system logs

  • Database logs

  • API Gateways logs

  • Load balancer logs

  • DNS logs

  • Authentication service logs

  • Proxy server logs

  • Configuration changelogs

Last updated