centralized logging 2 using promtail, loki

背景介绍

之前用 rsyslogd 做过一个集中的 log server,主要是收集服务器系统和审计日志。最近要做的这个集中的 log server,则是专注于收集、展示应用日志的。我现在的服务器,操作系统有两种:Debian 12(bookworm) 和 Ubuntu 24.04,准确的说:应用服务器都是 Ubuntu 24.04,只有运维专用的两台(含要做的这个 log server)是 Debian 12。

因为是小厂,所以就摒弃掉大而重的 elasicsearch 系的方案,直接用 grafana 同源的 loki 来做服务端,客户端收集日志也是 grafana 同源的 promtail,技术方案选型就这么愉快得决定了。

server

install

服务器软件(主要是 loki)的安装,具体参见官方文档:Install Grafana on Debian or Ubuntu

大致总结下,就是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
sudo apt-get install -y \
apt-transport-https \
software-properties-common \
wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | \
gpg --dearmor | \
sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo \
"deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" \
| sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install loki grafana-enterprise
# 上面是因为我的 loki 和 grafana 在一台机器上,所以就一起装了

configuration

loki 的 log server 配置还是相当简单的

1
2
vim /etc/loki/config.yml
# 修改 /etc/loki/config.yml 文件

加入如下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
auth_enabled: false

server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
grpc_server_max_concurrent_streams: 500

common:
instance_addr: 127.0.0.1
path_prefix: /var/lib/loki/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/loki/chunks
rules_directory: /var/lib/loki/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory

query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 500

compactor:
working_directory: /var/lib/loki/data/retention
compaction_interval: 1h
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 50
delete_request_store: filesystem

limits_config:
reject_old_samples: true
reject_old_samples_max_age: 24h
max_query_series: 5000
retention_period: 720h
ingestion_rate_mb: 20
ingestion_burst_size_mb: 40
max_entries_limit_per_query: 5000

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

pattern_ingester:
enabled: true

ruler:
alertmanager_client:
basic_auth_username: cl-am-admin
basic_auth_password: jFWVXDJX
alertmanager_url: localhost:9093

frontend:
encoding: protobuf

analytics:
reporting_enabled: false

重启 loki

1
systemctl restart loki

client

客户端上我们是用 promtail 来收集日志的。promtail 可以用系统安装的,也可以用
Docker 来跑,收集的日志也主要是两块:

  1. 应用程序的日志,这些是直接写在文件里的
  2. Docker container 的日志

假设我们的 promtail 是用 Docker 来跑的,我们用的 docker compose 文件(~/promtail.yaml)内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
name: promtail
services:
promtail:
image: grafana/promtail:3.3.2
container_name: promtail
restart: unless-stopped
volumes:
- ~/promtail:/mnt/config
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- /var/log/nginx:/var/log/nginx
labels:
SVC_NAME: promtail
networks:
- custombridge
command: ["-config.file=/mnt/config/promtail-config.yaml", "-config.expand-env=true"]
environment:
- HOSTNAME=app-0
- HOST_IP=172.24.125.149

networks:
custombridge:
external: true

然后其真正的配置文件 ~/promtail/promtail-config.yaml 的内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /mnt/config/var_log_positions.yaml

clients:
- url: http://loki.xxx.com:3100/loki/api/v1/push

scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- action: replace
replacement: '${HOSTNAME}'
target_label: 'hostname'
- action: replace
replacement: '${HOST_IP}'
target_label: 'host_ip'
- action: replace
replacement: 'docker'
target_label: 'job'
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container_name'
- source_labels: ['__meta_docker_container_id']
target_label: 'container_id'
- source_labels: ['__meta_docker_container_label_REPO_NAME']
target_label: 'repo_name'
- source_labels: ['__meta_docker_container_label_SVC_NAME']
target_label: 'svc_name'

- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: nginx
svc_name: nginx
hostname: ${HOSTNAME}
host_ip: ${HOST_IP}
agent: promtail
__path__: /var/log/nginx/access.log
pipeline_stages:
- json:
expressions:
domain_name: http_host
return_code: status
- labels:
domain_name:
return_code:

上面的 loki.xxx.com 就是 loki server 的地址,由上面的例子我们可以看到 promtail 同时收集了系统应用(nginx)的日志和 Docker container 的日志

最后启动 loki

1
docker compose -f ~/promtail.yaml up promtail -d

参考