# Normalization
[![[Pasted image 20251104230308.png]]](https://soc-expert.de/From+Logging+to+Alerting+and+Beyond)
Different systems produce logs in different formats. During ingest, the data may be **normalized**, converted into a consistent format to make it easier to analyze.
## Common Log Formats
In **SIEM (Security Information and Event Management)** systems, log formats are crucial because they determine how data is parsed, normalized, and analyzed. Here are the most common log formats used in SIEM environments:
### Syslog
- **Standardized format** used by Unix/Linux systems and many network devices.
- Transmits logs over UDP or TCP.
- Format: `<PRI>timestamp hostname application: message`
- Widely supported by SIEM tools like Splunk, QRadar, and ArcSight.
### CEF (Common Event Format)
- Developed by **ArcSight**.
- Structured and extensible format for security events.
- Format: `CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension`
- Easy to parse and normalize.
### LEEF (Log Event Extended Format)
- Developed by **IBM QRadar**.
- Similar to CEF but tailored for QRadar.
- Format: `LEEF:Version|Vendor|Product|Version|EventID|Attributes`
### JSON (JavaScript Object Notation)
- Increasingly popular due to its **structured and readable** format.
- Used by modern applications, cloud services, and APIs.
- Easily parsed and indexed by SIEMs like Splunk, Elastic Stack, and Sentinel.
### XML (eXtensible Markup Language)
- Used by some legacy systems and applications.
- Verbose but highly structured.
- Can be harder to parse without proper configuration.
### Windows Event Logs (EVTX)
- Native format for Windows systems.
- Includes logs for system, security, application, etc.
- SIEMs use agents (like Winlogbeat or WEC) to ingest and parse these logs.
### Plain Text / CSV
- Simple formats used by custom applications or legacy systems.
- Require custom parsing rules in SIEMs.
## Schema: What a SIEM Schema Typically Includes
Log entries are broken down into structured fields (e.g., timestamp, IP address, event type) so they can be queried and visualized effectively.
A schema refers to the **structured definition of how log data is organized, parsed, and stored** within the SIEM system. It defines the **fields**, **data types**, and **relationships** used to represent events and logs in a consistent, searchable format.
### Field Names
- Common fields: `timestamp`, `source_ip`, `destination_ip`, `username`, `event_type`, `severity`, `device_vendor`, etc.
- These fields are used to **normalize** data from different sources.
### Data Types
- Specifies whether a field is a string, integer, boolean, datetime, etc.
- Ensures proper indexing and querying.
### Field Mappings
- Maps raw log entries to standardized fields.
- Example: A firewall log might use `src` and `dst`, while a Windows log uses `Source IP` and `Destination IP`. The schema maps both to `source_ip` and `destination_ip`.
### Event Categories
- Groups events into categories like:
- Authentication
- Network activity
- File access
- Malware detection
- Helps in correlation and rule creation.
### Log Source Types
- Defines how different log sources are handled.
- Example: Windows Event Logs vs. Syslog vs. AWS CloudTrail.
### Normalization Rules
- Converts vendor-specific formats into a unified schema.
- Enables cross-platform analysis and correlation.
>[!note] Why Is a Schema Important in SIEM?
>
> - **Consistency**: Makes data from different sources comparable.
> - **Searchability**: Enables efficient querying and filtering.
> - **Correlation**: Allows linking related events across systems.
> - **Alerting**: Supports rule-based detection using standardized fields.
> - **Dashboards & Reports**: Facilitates visualization and analytics.
## Example: Simplified SIEM Schema for a Login Event
| Field | Value |
| ------------- | ---------------------- |
| `timestamp` | `2025-11-04T20:01:23Z` |
| `event_type` | `login_success` |
| `username` | `jdoe` |
| `source_ip` | `192.168.1.10` |
| `device_type` | `Windows Server` |
| `severity` | `low` |