Documentation — Pipeline Processing

Log Parsing and Enrichment

Structured extraction from raw event streams. Configure Grok patterns, parse JSON payloads, and enrich records with GeoIP, threat intelligence, and custom lookup tables.

Smart log analytics for Russian business

Grok Patterns

LogTrace ships with 120+ built-in Grok patterns covering nginx, Apache, syslog, Elasticsearch, and common application frameworks. Define custom patterns via the Pipeline Editor or upload a patterns.conf file.

Grok transforms unstructured text into named fields. Each pattern references atomic matchers such as NUMBER, IP, WORD, or composite patterns like HTTPDATE. The engine applies patterns sequentially until a match succeeds; the first match wins.

Built-in Pattern Catalog

Nginx Combined

%{IPORHOST:client_ip} - %{DATA:user} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:agent}"

Extracts 10 fields from standard nginx access logs. Status codes and byte counts are cast to integers automatically.

Syslog RFC 5424

%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:app}\[%{NUMBER:pid}\]: %{GREEDYDATA:message}

Parses rsyslog and syslog-ng output. Handles bracketed PIDs and arbitrary message tails.

Spring Boot

%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NUMBER:pid} --- \[%{DATA:thread}\] %{JAVACLASS:logger} : %{GREEDYDATA:message}

Captures thread name, logger class, and level from Spring Boot's default logback layout.

Defining Custom Patterns

Open the Pipeline Editor, navigate to Grok Stage, and click Add Pattern. Each custom pattern requires a unique name and a Grok expression. Example for a payment gateway log:

payment_tx %{TIMESTAMP_ISO8601:tx_time} %{DATA:merchant_id} txn=%{DATA:tx_id} amt=%{NUMBER:amount_usd} curr=%{WORD:currency} status=%{WORD:tx_status} card_last4=%{NOTSPACE:card_mask}

After saving, the pattern appears in the dropdown for any Grok stage. Pattern compilation is validated on save — invalid expressions are rejected with a pointer to the failing token.

Performance Notes

Grok pattern matching is compiled into finite-state automata on first use. A pipeline with 5 active Grok patterns processes approximately 42,000 events per second on a single 4-core worker node. Use match with an explicit pattern list rather than grok with patterns_dir to avoid unnecessary compilation overhead.

JSON Parser

Many modern services emit structured JSON logs. The JSON parser stage deserializes the message field (or any specified source) and promotes nested keys into the top-level event document. Flattening, type coercion, and field renaming are all configurable.

Basic Configuration

Enable the JSON stage in your pipeline and specify the source field. By default, LogTrace reads from message. If the source field is not valid JSON, the event passes through unmodified — no errors are raised.

Flattening Nested Objects

Set flatten_keys: true to convert nested objects into dot-separated field names. Example input: {"user": {"id": 4821, "role": "admin"}} becomes user.id = 4821 and user.role = "admin".

Type Coercion

Enable auto_cast: true to convert numeric strings to integers or floats, and true/false strings to booleans. The parser respects JSON number precision up to 15 significant digits.

Field Renaming

Use the rename_map option to remap keys during parsing. Example: {"@timestamp": "ts", "req.id": "request_id"} ensures consistent field naming across heterogeneous log sources.

Array Handling

JSON arrays in log events are preserved as JSON-encoded strings by default. Enable array_expand: true to explode arrays into indexed fields: tags.0, tags.1, etc. Arrays deeper than 3 levels are truncated to prevent memory pressure on the parsing worker.

Error Modes

Set on_failure to one of three values:

skip — discard malformed JSON events (default)
pass_through — leave the original message intact and add a _jsonparse_error flag
drop_and_alert — drop the event and increment the json_parse_failures metric for Prometheus scraping

Enrichment

Enrichment stages append contextual data to events after parsing. LogTrace supports GeoIP lookups, DNS reverse resolution, threat-feed matching, and custom CSV-based lookup tables. All enrichment runs asynchronously to avoid blocking the main ingestion pipeline.

GeoIP Database

LogTrace integrates with MaxMind GeoLite2 and commercial GeoIP2 databases. Upload a .mmdb file via the Settings panel or mount it at /etc/logtrace/geoip/. The enrichment stage resolves IP addresses into geographic and network attributes.

Configurable output fields include:

geoip.country.iso_code — two-letter ISO 3166-1 alpha-2 code (e.g., RU, DE, US)
geoip.country.name — full country name in English
geoip.city.name — city name (available in GeoLite2 City and commercial databases)
geoip.location.latitude and geoip.location.longitude — decimal coordinates
geoip.asn — Autonomous System number (e.g., AS12389)
geoip.organization — ISP or hosting provider name

Example configuration for an nginx access log pipeline:

geoip {
  source => "client_ip"
  database => "/etc/logtrace/geoip/GeoLite2-City.mmdb"
  fields => ["country", "city", "location", "asn", "organization"]
}

Threat Intelligence Lookup

Connect LogTrace to AbuseIPDB, AlienVault OTX, or a self-hosted threat feed. The enrichment stage checks source and destination IPs against known malicious indicators and appends a threat.score (0–100) and threat.tags array to matching events. Events scoring above a configurable threshold (default: 75) can trigger real-time alerts via Telegram or email.

Custom Lookup Tables

Upload a CSV file to create a key-value enrichment table. Use cases include mapping employee IDs to names, translating internal service codes to human-readable labels, or associating merchant IDs with business unit owners. Lookup tables are loaded into memory at pipeline start and refreshed on a configurable schedule (minimum interval: 5 minutes).

Example CSV for merchant enrichment:

merchant_id,merchant_name,business_unit,region
MRC-10042,AlfaTech Supplies,B2B Wholesale,Central Federal District
MRC-10118,Volga Logistics,B2C Retail,Volga Federal District
MRC-10205,Siberian Cloud Ltd,SaaS Platform,Siberian Federal District

Enrichment Pipeline Order

Enrichment stages execute in the order they appear in the pipeline definition. Place GeoIP before threat intelligence so that threat lookups can reference resolved geographic attributes. The maximum number of enrichment stages per pipeline is 8; exceeding this limit triggers a validation error.

Latency Budget

Each enrichment stage has a configurable timeout (default: 200 ms). If a lookup exceeds the timeout, the event continues without the enrichment data and a _enrichment_timeout marker is added. Database queries are cached with a TTL of 300 seconds to reduce upstream load.

Memory Footprint

A GeoLite2 City database consumes approximately 62 MB of RAM when loaded. Each custom lookup table adds 1–5 MB depending on row count. The enrichment worker reserves a 256 MB heap; monitor enrichment.heap_used in the metrics dashboard.

Next Steps

After configuring parsing and enrichment, proceed to pipeline deployment and monitoring.

Review the Pipeline Deployment Guide for instructions on rolling out changes to production workers without dropping events. Use the Pipeline Debugger to replay historical events through a new configuration and verify field extraction accuracy before switching traffic.