目录

Some Good Practices of Logging

Introduction

Logging is structured or unstructured text information generated by the system during operation. Usually, it can be regarded as a record of an event by the application. Logging can often help us discover unexpected behaviors in systems, especially in some microservice architecture systems. As an important part of Observability, Logging plays an irreplaceable role in our system development and maintenance.

Pillars of Observability

The importance of logging

To understand why logging plays an important role in products or systems, we must understand its value. Currently, the most widely used logging forms are alerting, troubleshooting, and business data visualization.

Alerting

Logging can be used as an important data source for our business system monitoring; mature product systems have alarm systems. If there is a problem exceeding a certain defined indicator in the system, the log system will automatically send the alarm information to the notification platform. The On-call people can locate and solve the problem according to the alarm information.

Troubleshooting

Imagine that the system you are responsible for developing and maintaining is found to be faulty. What is the first thing you should do? Check the system information to verify that it’s true. The log printed on the server is the best auxiliary information. As programmers, logs are the most familiar tool for solving problems.

Business Data Visualization

Many companies can use the logs of the production environment stored in their own databases to visualize business data in combination with the corresponding tools such as Grafana and SumoLogic.

Logging how-to

Templating

It’s necessary to sort out our log format and write logs according to certain specifications. Logging Format

  • Basic template

    • At minimum, qualified logs should have the following information: time, log level, and log information.
  • Advanced Template Add thread name, hostname, method name, class name, and the number of lines corresponding to the method;

    • Thread Name: Most applications don’t have a single user. For a single-instance service, many users accessing the same interface will execute the application in different threads. So using the thread name is best to differentiate the business process of corresponding users.
    • Hostname: Most current applications are deployed in the cloud with multiple instances, so on the basis of a single node, the logging needs to be distinguished at the instance level on multiple instances, and the hostname is the best way to differentiate.
    • Method Name: A convenient way to differentiate the source within the same log.
    • Class Name: A convenient way to quickly locate the business process.
    • Number of lines: A quick way to find the specific location of the log.

Formatting

To improve the readability of the log, we can focus on formatting.

  • Put square brackets around the log level, host name, and thread name;
  • Add parentheses to the class name and line number where the method name is located, and separate the class name and line number with a colon;
  • Add a horizontal line between the line number and log information; Specific formatting is also possible for log information:
  • For regular requests, response or other business logs, you can separate custom information and parameters with an underscore; multiple parameters are separated by commas;
  • For error message formatting, you can use Key:Value.

Chain-Tracking

If the recorded log is just a simple text description line by line, it’ll be hard to read. In a complex system or a system with frequent business operations, there will be multiple logs and we’ll have to spend time filtering out the relevant logs. The best way to solve this problem is through chain tracking of logs: putting one or more unique IDs in the business system that are added to each log, so that when locating business problems, we can quickly filter out the relevant logs through these unique IDs and Other criteria (e.g. time).

Logging on demand

Log level

Log Level

The output of the log is divided into levels.Different scenarios need to print different levels of logs.

  • Debug: Record technical details and logs to help understand the system operation;
  • Info: Record business information;
  • Warn: Acceptable error messages that are manageable and not urgent;
  • Error: Unexpected errors or system behavior, usually caused by system bugs or environmental problems.

At the same time, not all logs need to be recorded, we need to record as needed. The following table shows which logs should be recorded depending on the environment.

Environment Log Leave
Dev Debug
Test Debug
UAT Info
Prod Info

Logging position

The location of the log print also needs to be made clear.

  • When other systems call their own systems, they need to print the log once each when they receive a request and when the request is completed;
  • The log is printed once before the self-hosted system calls the interface of the third-party system and after receiving the response;
  • The log needs to be printed in any abnormal place in the system.

Tool recommendation

Different programming languages have different logging tools; the most famous is Apache’s Log4j, which is highly configurable and can be configured via external files at runtime. It is based on logging priority and provides mechanisms to instruct logging information to many destinations such as database, file, console, UNIX system log, etc.; log4j has been ported to other programming languages, such as logging in Python, log4js in NodeJS, log4rs in Rust.

Tips

Avoid printing or recording any sensitive information, including but not limited to various PII, PCI information. Remember to obey local laws and regulations, such as China’s PILI (Personal Information Protection Law) or Europe’s GDPR (General Data Protection Regulation). Choose the appropriate log level and log position as needed. ……

Summary

A good log can not only facilitate program development and provide the most important auxiliary information for troubleshooting, it also gives optimization suggestions or data statistics for business or infrastructure.

Refs

Disclaimer

本文仅代表个人观点,与 Thoughtworks 公司无任何关系。


https://cdn.jsdelivr.net/gh/guzhongren/data-hosting@main/20210819/wechat.ae9zxgscqcg.png

SHA256 checksum: f2fe1394e4ab9297ed69ff73ac32e9ac1375f01c2102183b509bf9379a5995d6

赞助

/images/pay/PayForGuzhongren.svg

SHA256 checksum: 964978ecd2059064abe542e51dc02e204d3ee2e6c320ca68e2b1399ce0c6953c

使用此文件进行校验: gpg --verify PayForGuzhongren.svg.sig