Logging System Design (Ingestion, Storage & Query at Scale)

Scenario

Services emit structured logs at massive scale; operators need search, dashboards, and retention with compliance for audit trails without paying petabyte prices for every debug printf. The interview is ingestion pipelines, tiered storage, and cost control—not “we put logs in Elasticsearch” without cardinality discipline.

Design a centralized logging platform that collects logs from thousands of services, stores them durably, and supports interactive search and alerting.

Constraints

Functional

Ingest structured logs; tag with service, env, trace/span ids; query by time range and filters; saved searches and alerts (high level)

Non-functional

Handle burst traffic; configurable retention per tenant/tier; durability for compliance logs

Scale

Millions of events per second aggregate; PB stored; global services

Stages ahead

1Requirement Analysis
2API Design
3High-Level Design
4HLD Extensions
5Trade-offs
Logging System Design (Ingestion, Storage & Query at Scale)