Production-ready infrastructure
Key infrastructure elements:
Database — Cloud or self-managed database
Logging — Сollect application and cluster logs
Monitoring — Сollect, alert, and visualize cluster and application metrics
Security — Vulnerability scanning and policy management
Cluster configuration and tooling
Recommended Kubernetes cluster configuration:
Small and medium workloads — 3 nodes X 4 VCPU 16 GB RAM
Huge workloads — 3 nodes X 8 VCPU X 64 GB RAM
Toolkit required for development and deployment:
AWS , GCP , AZURE - Cloud provider CLI and SDK. Depends on your cloud provider:
Kubectl - connection and cluster management
Helm - Kubernetes package manager
Optional - Development and Delivery tooling:
Argo CD - GitOps delivery and management
Flux - set of continuous and progressive delivery solutions for Kubernetes
Database
Managed solution
Aidbox supports all popular managed Postgresql databases. Supported versions - 13 and higher. See more details in this article — Run Aidbox on managed PostgreSQL .
Self-managed solution
For a self-managed solution, we recommend use AidboxDB image . This image contains all required extensions, backup tool, and pre-build replication support. Read more information in the documentation — AidboxDB .
First step — create volume
Copy apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-master-data
namespace: prod
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 300Gi
# depend on your cloud provider. Use SSD volumes
storageClassName: managed-premium
Next - create all required configs, like postgresql.conf
, required container parameters and credentials.
Copy apiVersion: v1
kind: ConfigMap
metadata:
name: db-pg-config
namespace: prod
data:
postgres.conf: |-
listen_addresses = '*'
shared_buffers = '2GB'
max_wal_size = '4GB'
pg_stat_statements.max = 500
pg_stat_statements.save = false
pg_stat_statements.track = top
pg_stat_statements.track_utility = true
shared_preload_libraries = 'pg_stat_statements'
track_io_timing = on
wal_level = logical
wal_log_hints = on
archive_command = 'wal-g wal-push %p'
restore_command = 'wal-g wal-fetch %f %p'
Copy apiVersion: v1
kind: ConfigMap
metadata:
name: db-config
namespace: prod
data:
PGDATA: /data/pg
POSTGRES_DB: postgres
Copy apiVersion: v1
kind: Secret
metadata:
name: db-secret
namespace: prod
type: Opaque
data:
POSTGRES_PASSWORD: cG9zdGdyZXM=
POSTGRES_USER: cG9zdGdyZXM=
Now we can create a database StatefulSet
Copy apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prod-db-master
namespace: prod
spec:
replicas: 1
serviceName: db
selector:
matchLabels:
service: db
template:
metadata:
labels:
service: db
spec:
volumes:
- name: db-pg-config
configMap:
name: db-pg-config
defaultMode: 420
- name: db-dshm
emptyDir:
medium: Memory
- name: db-data
persistentVolumeClaim:
claimName: db-master-data
containers:
- name: main
image: healthsamurai/aidboxdb:14.2
ports:
- containerPort: 5432
protocol: TCP
envFrom:
- configMapRef:
name: db-config
- secretRef:
name: db-secret
volumeMounts:
- name: db-pg-config
mountPath: /etc/configs
- name: db-dshm
mountPath: /dev/shm
- name: db-data
mountPath: /data
subPath: pg
Create master database service
Copy apiVersion: v1
kind: Service
metadata:
name: db
namespace: prod
spec:
ports:
- protocol: TCP
port: 5432
targetPort: 5432
selector:
service: db
Replica installation contains all the same steps but required additional configuration
Copy apiVersion: v1
kind: ConfigMap
metadata:
name: db-replica
namespace: prod
data:
PG_ROLE: replica
PG_MASTER_HOST: db-master
PG_REPLICA: streaming_replica_streaming
PGDATA: /data/pg
POSTGRES_DB: postgres
For backups and WAL archivation we are recommended cloud-native solution WAL-G . You can find full information about configuration and usage on documentation page .
Recommended backup policy — Full backup every week, incremental backup every day.
Alternative solutions
A set of tools to perform HA PostgreSQL with fail and switchover, automated backups.
Patroni — A Template for PostgreSQL HA with ZooKeeper, ETCD or Consul.
Postgres operator — The Postgres Operator delivers an easy to run HA PostgreSQL clusters on Kubernetes.
Aidbox
First of all, you need to get Aidbox license on Aidbox user portal
Create ConfigMap with all required config and database connection
Copy apiVersion: v1
kind: ConfigMap
metadata:
name: aidbox
namespace: prod
data:
AIDBOX_BASE_URL: https://my.box.url
AIDBOX_BOX_ID: aidbox
AIDBOX_FHIR_VERSION: 4.0.1
AIDBOX_PORT: '8080'
AIDBOX_STDOUT_PRETTY: all
BOX_INSTANCE_NAME: aidbox
BOX_METRICS_PORT: '8765'
PGDATABASE: aidbox
PGHOST: db.prod.svc.cluster.local # database address
PGPORT: '5432' # database port
Copy apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
AIDBOX_ADMIN_ID: <admin_login>
AIDBOX_ADMIN_PASSWORD: <admin_password>
AIDBOX_CLIENT_ID: <root_client_id>
AIDBOX_CLIENT_SECRET: <root_client_password>
AIDBOX_LICENSE: <JWT-LICENSE> # JWT license from aidbox user portal
PGPASSWORD: <db_password> # database password
PGUSER: <db_user> # database username
Aidbox Deployment
Copy apiVersion: apps/v1
kind: Deployment
metadata:
name: aidbox
namespace: prod
spec:
replicas: 2
selector:
matchLabels:
service: aidbox
template:
metadata:
labels:
service: aidbox
spec:
containers:
- name: main
image: healthsamurai/aidboxone:latest
ports:
- containerPort: 8080
protocol: TCP
- containerPort: 8765
protocol: TCP
envFrom:
- configMapRef:
name: aidbox
- secretRef:
name: aidbox
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 12
readinessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 6
startupProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 4
All additional information about HA Aidbox configuration you can find in this article — HA Aidbox
To verify that Aidbox started correctly you can check the logs:
Copy kubectl logs -f <aidbox-pod-name>
Create the Aidbox k8s service
Copy apiVersion: v1
kind: Service
metadata:
name: aidbox
namespace: prod
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8080
selector:
service: aidbox
Ingress
A Cluster must have an ingress controller Installed. Our recommendation is to use kubernetes Ingress NGINX Controller . As an alternative, you can use Traefic . More additional information about Ingress in k8s you can found in this documentation — Kubernetes Service Networking
Ingress NGINX controller
Ingress-nginx — is an Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer.
Copy helm upgrade \
--install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
CertManager
To provide a secure HTTPS connection you can use paid SSL certificates, issued for your domain, or use LetsEncrypt-issued certificates. In case of using LetsEcrypt, we recommend install and configure Cert Manager Operator
Copy helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.10.0 \ # Or latest available version
--set installCRDs=true
Configure Cluster Issuer:
Copy apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
email: hello@my-domain.com
preferredChain: ''
privateKeySecretRef:
name: issuer-key
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- http01:
ingress:
class: nginx # Ingress class name
Ingress resource
Now you can create k8s Ingress
for Aidbox deployment
Copy apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aidbox
namespace: prod
annotations:
acme.cert-manager.io/http01-ingress-class: nginx
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.class: nginx
spec:
tls:
- hosts:
- my.box.url
secretName: aidbox-tls
rules:
- host: my.box.url
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: aidbox
port:
number: 80
Now you can test ingress
Copy curl https://my.box.url
Logging
General logging & audit information can be found in this article — Logging & Audit
Aidbox supports integration with the following systems:
ElasticSearch integration
You can install ECK using official guide.
Configure Aidbox and ES integration
Copy apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
...
AIDBOX_ES_URL = http://es-service.es-ns.svc.cluster.local
AIDBOX_ES_AUTH = <user>:<password>
...
DataDog integration
Copy apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
...
AIDBOX_DD_API_KEY: <Datadog API Key>
...
Monitoring
For monitoring our recommendation is to use kube prometheus stack
Copy helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
Create Aidbox metrics service
Copy apiVersion: v1
kind: Service
metadata:
name: aidbox-metrics
namespace: prod
labels:
operated: prometheus
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8765
selector:
service: aidbox
Create ServiceMonitor config for scrapping metrics data
Copy apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: metrics
release: kube-prometheus
serviceMonitorSelector: aidbox
name: aidbox
namespace: kube-prometheus
spec:
endpoints:
- honorLabels: true
interval: 10s
path: /metrics
targetPort: 8765
- honorLabels: true
interval: 60s
path: /metrics/minutes
targetPort: 8765
- honorLabels: true
interval: 10m
path: /metrics/hours
targetPort: 8765
namespaceSelector:
any: true
selector:
matchLabels:
operated: prometheus
Or you can directly specify the Prometheus scrapers configuration
Copy global:
external_labels:
monitor: 'aidbox'
scrape_configs:
- job_name: aidbox
scrape_interval: 5s
metrics_path: /metrics
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
- job_name: aidbox-minutes
scrape_interval: 30s
metrics_path: /metrics/minutes
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
- job_name: aidbox-hours
scrape_interval: 1m
scrape_timeout: 30s
metrics_path: /metrics/hours
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
Alternative solutions
Thanos — highly available Prometheus setup with long term storage capabilities.
Grafana Mimir — highly available, multi-tenant, long-term storage for Prometheus.
Export the Aidbox Grafana dashboard
Aidbox metrics has integration with Grafana, which can generate dashboards and upload them to Grafana — Grafana Integration
Additional monitoring
System monitoring:
node exporter — Prometheus exporter for hardware and OS metrics exposed by *NIX kernels
kube state metrics — is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects
PostgreSQL monitoring:
pg_exporter — Prometheus exporter for PostgreSQL server metrics
Alerting
Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service.
Alert rules
Alert for long-running HTTP queries with P99 > 5s in 5m interval
Copy alert: SlowRequests
for: 5m
expr: histogram_quantile(0.99, sum (rate(aidbox_http_request_duration_seconds_bucket[5m])) by (le, route, instance)) > 5
labels: {severity: ticket}
annotations:
title: Long HTTP query execution
metric: '{{ $labels.route }}'
value: '{{ $value | printf "%.2f" }}'
Alert delivery
Alert manager template for Telegram
Copy global:
resolve_timeout: 5m
telegram_api_url: 'https://api.telegram.org/'
route:
group_by: [alertname instance]
# Default receiver
receiver: <my-ops-chat>
routes:
# Mute watchdog alert
- receiver: empty
match: {alertname: Watchdog}
receivers:
- name: empty
- name: <my-ops-chat>
telegram_configs:
- chat_id: <chat-id>
api_url: https://api.telegram.org
parse_mode: HTML
message: |-
<b>[{{ .CommonLabels.instance }}] {{ .CommonLabels.alertname }}</b>
{{ .CommonAnnotations.title }}
{{ range .Alerts }}{{ .Annotations.metric }}: {{ .Annotations.value }}
{{ end }}
bot_token: <bot-token>
All other integrations you can find on AlertManager documentation page.
Additional tools
Security
Vulnerability and security scanners:
Kubernetes Policy Management:
Advanced: