背景介绍
之前用 rsyslogd 做过一个集中的 log server,主要是收集服务器系统和审计日志。最近要做的这个集中的 log server,则是专注于收集、展示应用日志的。我现在的服务器,操作系统有两种:Debian 12(bookworm) 和 Ubuntu 24.04,准确的说:应用服务器都是 Ubuntu 24.04,只有运维专用的两台(含要做的这个 log server)是 Debian 12。
因为是小厂,所以就摒弃掉大而重的 elasicsearch 系的方案,直接用 grafana 同源的 loki 来做服务端,客户端收集日志也是 grafana 同源的 promtail,技术方案选型就这么愉快得决定了。
server
install
服务器软件(主要是 loki)的安装,具体参见官方文档:Install Grafana on Debian or Ubuntu
大致总结下,就是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| sudo apt-get install -y \ apt-transport-https \ software-properties-common \ wget sudo mkdir -p /etc/apt/keyrings/ wget -q -O - https://apt.grafana.com/gpg.key | \ gpg --dearmor | \ sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null echo \ "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" \ | sudo tee -a /etc/apt/sources.list.d/grafana.list sudo apt-get update sudo apt-get install loki grafana-enterprise
|
configuration
loki
的 log server 配置还是相当简单的
1 2
| vim /etc/loki/config.yml
|
加入如下内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
| auth_enabled: false
server: http_listen_port: 3100 grpc_listen_port: 9096 log_level: warn grpc_server_max_concurrent_streams: 500
common: instance_addr: 127.0.0.1 path_prefix: /var/lib/loki/loki storage: filesystem: chunks_directory: /var/lib/loki/loki/chunks rules_directory: /var/lib/loki/loki/rules replication_factor: 1 ring: kvstore: store: inmemory
query_range: results_cache: cache: embedded_cache: enabled: true max_size_mb: 500
compactor: working_directory: /var/lib/loki/data/retention compaction_interval: 1h retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 50 delete_request_store: filesystem
limits_config: reject_old_samples: true reject_old_samples_max_age: 24h max_query_series: 5000 retention_period: 720h ingestion_rate_mb: 20 ingestion_burst_size_mb: 40 max_entries_limit_per_query: 5000
schema_config: configs: - from: 2020-10-24 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h
pattern_ingester: enabled: true
ruler: alertmanager_client: basic_auth_username: cl-am-admin basic_auth_password: jFWVXDJX alertmanager_url: localhost:9093
frontend: encoding: protobuf
analytics: reporting_enabled: false
|
重启 loki
client
客户端上我们是用 promtail 来收集日志的。promtail 可以用系统安装的,也可以用
Docker 来跑,收集的日志也主要是两块:
- 应用程序的日志,这些是直接写在文件里的
- Docker container 的日志
假设我们的 promtail 是用 Docker 来跑的,我们用的 docker compose 文件(~/promtail.yaml
)内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| name: promtail services: promtail: image: grafana/promtail:3.3.2 container_name: promtail restart: unless-stopped volumes: - ~/promtail:/mnt/config - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock - /var/log/nginx:/var/log/nginx labels: SVC_NAME: promtail networks: - custombridge command: ["-config.file=/mnt/config/promtail-config.yaml", "-config.expand-env=true"] environment: - HOSTNAME=app-0 - HOST_IP=172.24.125.149
networks: custombridge: external: true
|
然后其真正的配置文件 ~/promtail/promtail-config.yaml
的内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
| server: http_listen_port: 9080 grpc_listen_port: 0
positions: filename: /mnt/config/var_log_positions.yaml
clients: - url: http://loki.xxx.com:3100/loki/api/v1/push
scrape_configs: - job_name: docker docker_sd_configs: - host: unix:///var/run/docker.sock refresh_interval: 5s relabel_configs: - action: replace replacement: '${HOSTNAME}' target_label: 'hostname' - action: replace replacement: '${HOST_IP}' target_label: 'host_ip' - action: replace replacement: 'docker' target_label: 'job' - source_labels: ['__meta_docker_container_name'] regex: '/(.*)' target_label: 'container_name' - source_labels: ['__meta_docker_container_id'] target_label: 'container_id' - source_labels: ['__meta_docker_container_label_REPO_NAME'] target_label: 'repo_name' - source_labels: ['__meta_docker_container_label_SVC_NAME'] target_label: 'svc_name'
- job_name: nginx static_configs: - targets: - localhost labels: job: nginx svc_name: nginx hostname: ${HOSTNAME} host_ip: ${HOST_IP} agent: promtail __path__: /var/log/nginx/access.log pipeline_stages: - json: expressions: domain_name: http_host return_code: status - labels: domain_name: return_code:
|
上面的 loki.xxx.com
就是 loki server 的地址,由上面的例子我们可以看到 promtail 同时收集了系统应用(nginx)的日志和 Docker container 的日志
最后启动 loki
1
| docker compose -f ~/promtail.yaml up promtail -d
|
参考