一键部署指南

一键部署脚本使用

先回答一下部署问的最多的几个问题

Q:没有显卡能把项目跑起来吗

A:可以看到前端页面,个别服务无法启动(数据处理算法,model-measuring)。

Q:没有GPU能跑算法吗

A:可以,但要自己写,天枢除了群里的pytorch-demo 不提供任何算法。

Q:部署最低要什么配置呢

A:32G内存差不多可以把所有服务(四个中间件服务,19个JAVA服务,K8s集群默认服务)跑起来,但也就只剩下几个G内存可以跑算法,所以建议最低64GB服务器起步。

Q:天枢的主要功能是什么

A:天枢的主要功能是对k8s起到一个调度作用,例如调度k8s启动一个notebook的pod,或者给你的训练任务启动一个pod专门训练,天枢平台可以看作将你的大量资源按照最小配额1C/1G/1GPU为单位的分配任务,避免了分配物理服务器的冲突及资源浪费。

Q:某某操作系统可以跑吗,某某服务器可以跑吗,某某架构的服务器可以运行天枢吗

A:首先天枢是基于JAVA开发的一套k8s调度平台,只要你的服务器可以支持k8s(插件都需要有适配),有自己的GPU训练生态(比如有适配的显卡驱动,还有适配的docker虚拟化,还有k8s适配的插件),那就可以运行起来天枢。

从0到1如何部署天枢项目(单节点)

1.部署一个k8s集群

https://kubesphere.com.cn/docs/v3.3/quick-start/all-in-one-on-linux/

参考kubesphere的All-in-one模式安装一个k8s集群

推荐下面版本,可以直接安装
./kk create cluster --with-kubernetes v1.21.13 --with-kubesphere v3.3.1

2.部署k8s集群相关组件

nginx-ingress http://docs.tianshu.org.cn/docs/setup/deploy-nginx-ingress-controller

  • ingress.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
---
# Source: ingress-nginx/templates/controller-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
data:
---
# Source: ingress-nginx/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ''
resources:
- nodes
verbs:
- get
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- update
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io # k8s 1.14+
resources:
- ingressclasses
verbs:
- get
- list
- watch
---
# Source: ingress-nginx/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- namespaces
verbs:
- get
- apiGroups:
- ''
resources:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- update
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io # k8s 1.14+
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- configmaps
resourceNames:
- ingress-controller-leader-nginx
verbs:
- get
- update
- apiGroups:
- ''
resources:
- configmaps
verbs:
- create
- apiGroups:
- ''
resources:
- endpoints
verbs:
- create
- get
- update
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
---
# Source: ingress-nginx/templates/controller-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-service-webhook.yaml
apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller-admission
namespace: ingress-nginx
spec:
type: ClusterIP
ports:
- name: https-webhook
port: 443
targetPort: webhook
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: NodePort
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
dnsPolicy: ClusterFirst
containers:
- name: controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.32.0
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=ingress-nginx/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
resources:
requests:
cpu: 100m
memory: 90Mi
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
---
# Source: ingress-nginx/templates/admission-webhooks/validating-webhook.yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
name: ingress-nginx-admission
namespace: ingress-nginx
webhooks:
- name: validate.nginx.ingress.kubernetes.io
rules:
- apiGroups:
- extensions
- networking.k8s.io
apiVersions:
- v1beta1
operations:
- CREATE
- UPDATE
resources:
- ingresses
failurePolicy: Fail
clientConfig:
service:
namespace: ingress-nginx
name: ingress-nginx-controller-admission
path: /extensions/v1beta1/ingresses
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
rules:
- apiGroups:
- admissionregistration.k8s.io
resources:
- validatingwebhookconfigurations
verbs:
- get
- update
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-createSecret.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-create
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
spec:
template:
metadata:
name: ingress-nginx-admission-create
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: create
image: jettech/kube-webhook-certgen:v1.2.0
imagePullPolicy: IfNotPresent
args:
- create
- --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.ingress-nginx.svc
- --namespace=ingress-nginx
- --secret-name=ingress-nginx-admission
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
securityContext:
runAsNonRoot: true
runAsUser: 2000
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-patchWebhook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-patch
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
spec:
template:
metadata:
name: ingress-nginx-admission-patch
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: patch
image: jettech/kube-webhook-certgen:v1.2.0
imagePullPolicy:
args:
- patch
- --webhook-name=ingress-nginx-admission
- --namespace=ingress-nginx
- --patch-mutating=false
- --secret-name=ingress-nginx-admission
- --patch-failure-policy=Fail
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
securityContext:
runAsNonRoot: true
runAsUser: 2000
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- secrets
verbs:
- get
- create
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-2.0.3
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.32.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx

EFK日志系统 http://docs.tianshu.org.cn/docs/setup/manage-cluster-logs

  • EFK.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Elasticsearch"
spec:
type: NodePort
ports:
- name: http
nodePort: 32321
port: 9200
protocol: TCP
targetPort: db
- name: http-9300
nodePort: 32322
port: 9300
protocol: TCP
targetPort: transport
selector:
k8s-app: elasticsearch-logging
---
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "services"
- "namespaces"
- "endpoints"
verbs:
- "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: v7.4.2
addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 3
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v7.4.2
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v7.4.2
spec:
serviceAccountName: elasticsearch-logging
containers:
- args:
- -c
- "sed -i 's#-Xms1g#-Xms4g#g;s#-Xmx1g#-Xmx4g#g' /usr/share/elasticsearch/config/jvm.options && echo -e 'http.cors.enabled: true\nhttp.cors.allow-origin: \"*\"' >> config/elasticsearch.yml && bin/run.sh"
command:
- /bin/bash
image: quay.io/fluentd_elasticsearch/elasticsearch:v7.4.2
name: elasticsearch-logging
imagePullPolicy: IfNotPresent
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 10000m
memory: 8Gi
requests:
cpu: 1000m
memory: 4Gi
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
# livenessProbe:
# tcpSocket:
# port: transport
# initialDelaySeconds: 5
# timeoutSeconds: 10
# readinessProbe:
# tcpSocket:
# port: transport
# initialDelaySeconds: 5
# timeoutSeconds: 10
volumeMounts:
- name: elasticsearch-logging
mountPath: /data
env:
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: elasticsearch-logging
emptyDir: {}
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: alpine:3.6
command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
---
apiVersion: v1
kind: Service
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Kibana"
spec:
type: NodePort
ports:
- port: 5601
protocol: TCP
targetPort: ui
selector:
k8s-app: kibana-logging
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
addonmanager.kubernetes.io/mode: Reconcile
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kibana-logging
template:
metadata:
labels:
k8s-app: kibana-logging
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: kibana-logging
image: docker.elastic.co/kibana/kibana-oss:7.2.0
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
env:
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch-logging:9200
- name: SERVER_NAME
value: kibana-logging
# - name: SERVER_BASEPATH
# value: /api/v1/namespaces/kube-system/services/kibana-logging/proxy
- name: SERVER_REWRITEBASEPATH
value: "false"
ports:
- containerPort: 5601
name: ui
protocol: TCP
livenessProbe:
httpGet:
path: /api/status
port: ui
initialDelaySeconds: 5
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /api/status
port: ui
initialDelaySeconds: 5
timeoutSeconds: 10
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/annotation*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/tadl*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/train*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/serving*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/modelopt*.log
Parser docker
DB /var/log/flb_kube.db
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/modelopt*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/batchserving*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/pointcloud*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/data-rn*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 1
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match kube.*
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Index kubelogs
Replace_Dots On
Retry_Limit False
Type doc
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.3.11
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"

metrics-server http://docs.tianshu.org.cn/docs/setup/deploy-metrics-server

  • metrics-server.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: registry.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
args:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
volumeMounts:
- name: tmp-dir
mountPath: /tmp
---
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/name: "Metrics-server"
kubernetes.io/cluster-service: "true"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: 443
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system

prometheus grafana http://docs.tianshu.org.cn/docs/setup/monitor-pod-indicator-information

kubectl create namespace monitoring
  • monitor.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
containers:
- image: prom/node-exporter:v1.0.1
name: node-exporter
ports:
- containerPort: 9100
protocol: TCP
name: http
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitoring
spec:
ports:
- name: http
port: 9100
nodePort: 31672
protocol: TCP
type: NodePort
selector:
k8s-app: node-exporter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "kubernetes-apiservers"
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: "kubernetes-nodes"
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: "kubernetes-cadvisor"
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: "kubernetes-service-endpoints"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: "kubernetes-services"
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: "kubernetes-ingresses"
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: prometheus-deployment
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- image: prom/prometheus:v2.7.1
name: prometheus
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention=24h"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/prometheus"
name: data
- mountPath: "/etc/prometheus"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
serviceAccountName: prometheus
volumes:
- name: data
emptyDir: {}
- name: config-volume
configMap:
name: prometheus-config
---
kind: Service
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: monitoring
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30003
selector:
app: prometheus
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
selector:
matchLabels:
app: grafana
replicas: 1
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
protocol: TCP
env:
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: "Viewer"
- name: GF_SECURITY_ALLOW_EMBEDDING
value: "true"
volumes:
- name: grafana-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- port: 80
targetPort: 3000
nodePort: 30006
selector:
app: grafana

3.部署gpu相关服务(没有gpu可以忽略)

显卡驱动安装 http://docs.tianshu.org.cn/docs/setup/deploy-nvidia-graphics-driver

nvidia-docker安装 https://github.com/NVIDIA/nvidia-docker

nvidia-device-plugin部署 http://docs.tianshu.org.cn/docs/setup/deploy-kubernetes-cluster#%E5%AE%89%E8%A3%85-nvidia-device-plugin%E4%B8%BA%E4%BA%86%E5%9C%A8%E9%9B%86%E7%BE%A4%E4%B8%AD%E4%BD%BF%E7%94%A8-nvidia-gpu

节点打标签

kubectl label nodes your-gpu-node-name gpu=gpu
kubectl taint node [your master hostname] node-role.kubernetes.io/master-

4.部署java等服务

1.下载一键部署脚本

git clone https://gitee.com/zhijiangtianshu/dubhe-deploy.git
//将kubeconfig文件添加到kubeconfig.yaml
cd dubhe-deploy && cat /etc/kubernetes/admin.conf > template/kubeconfig.yaml
//手动检查一下admin.conf里面把域名改成ip

2.修改configmap.yaml中的参数,替换为自己的

修改configmap.yaml
修改configmap.yaml

3.执行构建镜像操作(需要时间和网络有关,耐心等待,构建失败使用ps -ef | grep dubhe|grep -v grep |awk '{print $2}' | xargs kill -9 终止进程)

./dubhectl build-image all

4.安装dubhe服务(这一步需要拉取大量的镜像,可以提前拉取)

可以提前拉取的镜像

registry.cn-hangzhou.aliyuncs.com/enlin/imgprocess:v1
registry.cn-hangzhou.aliyuncs.com/enlin/ofrecord:v1
registry.cn-hangzhou.aliyuncs.com/enlin/videosample:v1
registry.cn-hangzhou.aliyuncs.com/enlin/dubhe-java:v1
registry.cn-hangzhou.aliyuncs.com/enlin/visual-server:v1
registry.cn-hangzhou.aliyuncs.com/enlin/model-converter:v1
registry.cn-hangzhou.aliyuncs.com/enlin/model-measuring:v1
registry.cn-hangzhou.aliyuncs.com/enlin/oneflow-gpu:base
registry.cn-hangzhou.aliyuncs.com/enlin/automl-nas-pytorch17:v1
minio/minio:RELEASE.2020-04-28T23-56-56Z
nacos/nacos-mysql:5.7
nacos/nacos-server:2.0.3
redis:5.0.7
registry.cn-hangzhou.aliyuncs.com/enlin/storage-init:v1

安装服务(等待)

./dubhectl install

5.确认服务安装成功

[root@k8s ~]# kubectl get pod -n dubhe-system
NAME READY STATUS RESTARTS AGE
algorithm-imgprocess-7cf8f9b984-fh8cr 1/1 Running 0 41h
algorithm-imgprocess-7cf8f9b984-v49gm 1/1 Running 0 41h
algorithm-ofrecord-9cf987d7b-84vpn 1/1 Running 0 41h
algorithm-ofrecord-9cf987d7b-q5dxz 1/1 Running 0 41h
algorithm-videosample-bb996d54b-8ppbh 1/1 Running 0 41h
algorithm-videosample-bb996d54b-vw64s 1/1 Running 0 41h
backend-68f896cb54-6hfj2 1/1 Running 0 41h
backend-admin-77f5d8cbb-g4m7d 1/1 Running 0 41h
backend-algorithm-5966599b6c-n7vlb 1/1 Running 0 41h
backend-auth-5568f66d45-5h6jn 1/1 Running 0 41h
backend-auth-5568f66d45-bfmm5 1/1 Running 0 17h
backend-data-56c6c77cdd-8nwn2 1/1 Running 0 41h
backend-data-dcm-7ff4d5b757-l6d7d 1/1 Running 0 41h
backend-data-task-77f4bccd88-s9qhq 1/1 Running 0 41h
backend-dcm4chee-67958d9547-skjh8 3/3 Running 0 41h
backend-gateway-6b4c55bbcc-mf54s 1/1 Running 0 41h
backend-image-577468957c-4gvgc 1/1 Running 0 41h
backend-k8s-7f4bfbb674-bf2pl 1/1 Running 0 41h
backend-measure-67bf8cfddb-2fw49 1/1 Running 0 41h
backend-model-6fd6f7b597-vdvgn 1/1 Running 0 41h
backend-model-converter-594f46d85b-ffjmz 1/1 Running 0 41h
backend-model-measure-7db9894d98-xrwfq 1/1 Pending 0 41h
backend-notebook-5c8bc8686f-zld5k 1/1 Running 0 41h
backend-optimize-855898859-snxxb 1/1 Running 0 41h
backend-point-cloud-66ddff8cc6-tk95b 1/1 Running 0 41h
backend-serving-8d4686bc4-xfpf8 1/1 Running 0 41h
backend-serving-gateway-5747b9d495-4nx82 1/1 Running 0 41h
backend-tadl-76d7c5fbb7-t7ssh 1/1 Running 0 41h
backend-terminal-5df96477cf-hs9ms 1/1 Running 0 41h
backend-train-69d85d86b8-wfxvl 1/1 Running 0 41h
backend-visual-5f8774f766-dslpn 1/1 Running 0 41h
minio-758894565d-vvwbq 1/1 Running 0 41h
mysql-6dd845548b-wnch9 1/1 Running 0 41h
nacos-5c9f6c67f8-wrjwl 1/1 Running 0 41h
redis-8c9c45cb6-29txf 1/1 Running 0 41h
web-7d94f7d46-m84pz 1/1 Running 0 41h

6.创建软连接(注意自己的安装目录)

ln -s /data/dubhe-storage/ /nfs

7.设置es

//先找到kibana对外的端口
kubectl get svc -A| grep kibana| awk '{print $6}'|awk -F ':|/' '{print $2}'
//浏览器打开控制台
masterip:{上面返回的端口}/app/kibana#/dev_tools/console?_g=()

创建索引

创建索引
创建索引

设置最大值上限

PUT _all/_settings
{
"index":{
"max_result_window": 2000000000
}
}

5.打开登录页面(初始密码admin/admin)

masterip:30800