管理集群日志

方案介绍

项目中,集群日志管理采取的方案为 Elasticsearch、Fluentd Bit、Kibana 组合,它们分工如下:

  • Fluentd Bit 负责从 Kubernetes 搜集日志并发送给 Elasticsearch。
  • Elasticsearch 负责存储日志并提供查询接口。
  • Kibana 用于浏览和搜索存储在 Elasticsearch 中的日志。
note

通过在每台 node 上部署一个以 DaemonSet 方式运行 Fluentd Bit 来收集每台 node 上的日志。Fluentd Bit 将 docker 日志目录 /var/lib/docker/containers 和 /var/log 目录挂载到 Pod 中,然后 Pod 会在 node 节点的 /var/log/pods 目录中创建新的目录,可以区别不同的容器日志输出,该目录下有一个日志文件链接到 /var/lib/docker/contianers 目录下的容器日志输出。

部署方案

这里会提供两种部署方式:

  • 快速部署
  • 分步骤部署
note

温馨提示:

安装过程可能会遇到的问题和解决方案如下:

  • elasticsearch-logging 不断地 restart
    解决方案:
    问题是由内存资源不足导致的,一般出现在虚拟机 linux 环境,服务器暂未遇到该问题,调整节点内存为 4 GB 以上即可解决。

  • elasticsearch-logging 容器 liveness 和 readiness 探针检测失败
    解决方案:
    由于 elasticsearch-logging 初始化时间较长,探针默认 timeout 时间设置过短,可以修改 es-statefulset.yaml 中探针 timeout 时长,或者干脆将探针相关配置注释掉。

快速部署

执行如下命令后等待部署完成。

$ wget http://122.224.169.50:30000/index.php/s/TzzxEq3gSr9p4Kq/download -O dubheDeployScriptfile.zip
$ unzip dubheDeployScriptfile.zip
$ cd dubheDeployScriptfile/efk/
$ kubectl apply -f efk-deployment.yaml

如果你想看更详细的部署过程,可以往下看分步骤部署过程。

分步骤部署

执行如下 yaml 文件

note

温馨提示: 修改 yaml 文件需特别注意缩进格式,避免出现 yaml 文件解析错误

  • elasticsearch.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Elasticsearch"
spec:
type: NodePort
ports:
- name: http
nodePort: 32321
port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging
---
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "services"
- "namespaces"
- "endpoints"
verbs:
- "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: v7.4.2
addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 3
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v7.4.2
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v7.4.2
spec:
serviceAccountName: elasticsearch-logging
containers:
- args:
- -c
- "sed -i 's#-Xms1g#-Xms4g#g;s#-Xmx1g#-Xmx4g#g' /usr/share/elasticsearch/config/jvm.options && echo -e 'http.cors.enabled: true\nhttp.cors.allow-origin: \"*\"' >> config/elasticsearch.yml && bin/run.sh"
command:
- /bin/bash
image: quay.io/fluentd_elasticsearch/elasticsearch:v7.4.2
name: elasticsearch-logging
imagePullPolicy: IfNotPresent
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 10000m
memory: 8Gi
requests:
cpu: 1000m
memory: 4Gi
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
# livenessProbe:
# tcpSocket:
# port: transport
# initialDelaySeconds: 5
# timeoutSeconds: 10
# readinessProbe:
# tcpSocket:
# port: transport
# initialDelaySeconds: 5
# timeoutSeconds: 10
volumeMounts:
- name: elasticsearch-logging
mountPath: /data
env:
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: elasticsearch-logging
emptyDir: {}
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: alpine:3.6
command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
  • kibana.yaml
apiVersion: v1
kind: Service
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Kibana"
spec:
type: NodePort
ports:
- port: 5601
protocol: TCP
targetPort: ui
selector:
k8s-app: kibana-logging
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
addonmanager.kubernetes.io/mode: Reconcile
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kibana-logging
template:
metadata:
labels:
k8s-app: kibana-logging
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: kibana-logging
image: docker.elastic.co/kibana/kibana-oss:7.2.0
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
env:
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch-logging:9200
- name: SERVER_NAME
value: kibana-logging
# - name: SERVER_BASEPATH
# value: /api/v1/namespaces/kube-system/services/kibana-logging/proxy
- name: SERVER_REWRITEBASEPATH
value: "false"
ports:
- containerPort: 5601
name: ui
protocol: TCP
livenessProbe:
httpGet:
path: /api/status
port: ui
initialDelaySeconds: 5
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /api/status
port: ui
initialDelaySeconds: 5
timeoutSeconds: 10
  • fluent-bit.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/train*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/serving*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/modelopt*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/batchserving*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match kube.*
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Index kubelogs
Replace_Dots On
Retry_Limit False
Type doc
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.3.11
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"

执行上面 yaml 文件

kubectl appliy -f elasticsearch.yaml
kubectl appliy -f kibana.yaml
kubectl appliy -f fluent-bit.yaml

执行以下命令查看以下 Pod 状态

kubectl get pod -n kube-system -o wide

如果所有的 Pod 都是 Running 状态且就绪,则表示日志管理工具部署成功。

配置 Elasticsearch

执行以下命令查看安装的 Service:

kubectl get svc -n kube-system

可以看到 Elasticsearch 和 Kibana 对外提供的节点端口 port,如下:

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-logging NodePort 10.110.110.73 <none> 9200:32321/TCP 22d
kibana-logging NodePort 10.111.13.38 <none> 5601:32680/TCP 22d

我们可以通过http://MASTER_IP:PORT 访问相关服务,MASTER_IP即为k8s主节点服务器IP地址。例如通过http://MASTER_IP:32680 访问Kibana。

ES 默认查询结果是 10000 条,需要自行配置来修改默认值,访问 Kibana,在控制台执行以下命令完成配置:

PUT _all/_settings
{
"index":{
"max_result_window": 2000000000
}
}
Last updated on