Fork me on GitHub

   Project Meniscus

A Better Focusing Lens for System and Application Events

The Problem

Software systems produce events but often do so in non-uniform ways. A system may log information to a file in a grammar that requires comprehension to extract meaning from the output. A system may also send events to other systems in a structured manner such as REST. Other systems may output events directly into a database for storage or pass them to a queue for distribution to interested consumers.

In highly diverse, clustered environments like those seen in many OpenStack deployments, the system event landscape can become complex, difficult to manage and over time become opaque to the point where events generated no longer provide value. The information in many of these events have definite business value, whether it be to meter a tenant or to communicate that a portion of the cluster has been damaged or degraded. Therefore, it’s imperative, despite the complexity of the event ecosystem, to capture this information in a standardized and scalable manner.

Design Goals

  • Stress the utilization of standards, of which there are many for this problem domain
  • Adhere to compliance rules related to system and application event logging
  • Support multiple tenants in secure isolation
  • Impact target installations as little as possible
  • Allow for direct application integration via a structured publication endpoint
  • Design for platform efficiency, durability and scalability
  • Provide administrators with text search capability of all events processed and stored for a given tenant
  • Provide well formed a set of API specifications for configuration and administration
  • Provide common sinks for already existing systems such as Ceilometer
  • The architecture must be resilient to multiple failures
  • Must support end-to-end runtime mutability

Technologies and Standards

  • Python allows us to better align with efforts in the OpenStack community.
  • Common Event Expression (CEE) unifies event representation and classification by combining support for multiple event syntaxes and log protocols in a single event standard.
  • Syslog, due to the proliferation of Syslog as a logging strategy, building on top of the latest proposed standard and supporting software will allow us to expose an already present wealth of system information and application events with minimal effort.
  • RSyslog, given the number of production RFC implementations Rsyslog appears to be very complete and is recommended for implementation in this architecture.
  • Liblognorm is written the C programming language. If maximum speed is required, it can be utilized for mapping unstructured events into structured data quickly.

Event Transport

Once configured, logging events may make it in to the logging system via a relay using one of two methodologies:

  • Structured:
    Some events may be structured (well formed with defined data fields). Structured events should require no extra normalization or mapping to be placed into storage. It's recommended that HTTP is used to describe this endpoint due to the soft transaction mechanics of the request/response life cycle, in addition to a host of other RESTful features that may be employed. This endpoint should allow for publication across a WAN.
  • Unstructured:
    Some events may be unstructured (log lines). Unstructured events may require normalization and mapping into a common event expression data structure. Unstructured events should arrive via the Syslog protocol over TCP. This endpoint may be published to across a WAN.

Stream Encryption

While expensive, efficient encryption prevents many malicious attack vectors that messages in transit may be subject to. Two standards are recommended for securing messages in transport: the Syslog TCP TLS specification and SSL via HTTPS.

Stream Compression

Compression may be enabled for more efficient utilization of network bandwidth. Rsyslog supports on the wire zlib compression which may be utilized. Many HTTP implementation also compression such as gzip or deflate.

Utilization of these standards may be enabled by the tenant or by relays communicating to one another.