# Normalization [![[Pasted image 20251104230308.png]]](https://soc-expert.de/From+Logging+to+Alerting+and+Beyond) Different systems produce logs in different formats. During ingest, the data may be **normalized**, converted into a consistent format to make it easier to analyze. ## Common Log Formats In **SIEM (Security Information and Event Management)** systems, log formats are crucial because they determine how data is parsed, normalized, and analyzed. Here are the most common log formats used in SIEM environments: ### Syslog - **Standardized format** used by Unix/Linux systems and many network devices. - Transmits logs over UDP or TCP. - Format: `<PRI>timestamp hostname application: message` - Widely supported by SIEM tools like Splunk, QRadar, and ArcSight. ### CEF (Common Event Format) - Developed by **ArcSight**. - Structured and extensible format for security events. - Format: `CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension` - Easy to parse and normalize. ### LEEF (Log Event Extended Format) - Developed by **IBM QRadar**. - Similar to CEF but tailored for QRadar. - Format: `LEEF:Version|Vendor|Product|Version|EventID|Attributes` ### JSON (JavaScript Object Notation) - Increasingly popular due to its **structured and readable** format. - Used by modern applications, cloud services, and APIs. - Easily parsed and indexed by SIEMs like Splunk, Elastic Stack, and Sentinel. ### XML (eXtensible Markup Language) - Used by some legacy systems and applications. - Verbose but highly structured. - Can be harder to parse without proper configuration. ### Windows Event Logs (EVTX) - Native format for Windows systems. - Includes logs for system, security, application, etc. - SIEMs use agents (like Winlogbeat or WEC) to ingest and parse these logs. ### Plain Text / CSV - Simple formats used by custom applications or legacy systems. - Require custom parsing rules in SIEMs. ## Schema: What a SIEM Schema Typically Includes Log entries are broken down into structured fields (e.g., timestamp, IP address, event type) so they can be queried and visualized effectively. A schema refers to the **structured definition of how log data is organized, parsed, and stored** within the SIEM system. It defines the **fields**, **data types**, and **relationships** used to represent events and logs in a consistent, searchable format. ### Field Names - Common fields: `timestamp`, `source_ip`, `destination_ip`, `username`, `event_type`, `severity`, `device_vendor`, etc. - These fields are used to **normalize** data from different sources. ### Data Types - Specifies whether a field is a string, integer, boolean, datetime, etc. - Ensures proper indexing and querying. ### Field Mappings - Maps raw log entries to standardized fields. - Example: A firewall log might use `src` and `dst`, while a Windows log uses `Source IP` and `Destination IP`. The schema maps both to `source_ip` and `destination_ip`. ### Event Categories - Groups events into categories like: - Authentication - Network activity - File access - Malware detection - Helps in correlation and rule creation. ### Log Source Types - Defines how different log sources are handled. - Example: Windows Event Logs vs. Syslog vs. AWS CloudTrail. ### Normalization Rules - Converts vendor-specific formats into a unified schema. - Enables cross-platform analysis and correlation. >[!note] Why Is a Schema Important in SIEM? > > - **Consistency**: Makes data from different sources comparable. > - **Searchability**: Enables efficient querying and filtering. > - **Correlation**: Allows linking related events across systems. > - **Alerting**: Supports rule-based detection using standardized fields. > - **Dashboards & Reports**: Facilitates visualization and analytics. ## Example: Simplified SIEM Schema for a Login Event | Field | Value | | ------------- | ---------------------- | | `timestamp` | `2025-11-04T20:01:23Z` | | `event_type` | `login_success` | | `username` | `jdoe` | | `source_ip` | `192.168.1.10` | | `device_type` | `Windows Server` | | `severity` | `low` |