· 5 comments
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.19.0-11-amd64
Operating System: Debian GNU/Linux 10 (buster)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.992GiB
Name: k01
ID: ZXMU:SN3A:4SXR:ZA2Y:JFUG:MZCK:5DMW:SRE5:SA3P:WLLS:H6QO:CZQC
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
nodes:
- address: myip
user: root
role:
- controlplane
- etcd
- worker
cluster_name: mycluster
kubernetes_version: v1.18.8-rancher1-1
authentication:
strategy: x509
sans:
- "mycluster.mydomain.mytld"
authorization:
mode: rbac
network:
plugin: calico
provider: coredns
ingress:
provider: nginx
options:
use-forwarded-headers: 'true'
services:
kube-api:
secrets_encryption_config:
enabled: true
Haproxy.cfg loadbalancer (just in case it might be relevant, I hope not)
Global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend main
bind *:6443
bind *:443
bind *:80
mode tcp
option tcplog
acl is_kubeapi hdr(host) -i mycluster.mydomain.mytld
use_backend kubeapi if is_kubeapi
default_backend kubewrk
backend kubeapi
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server kubeapi-mycluster-mynode myip:6443 check
backend kubewrk
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server kubewrk-mycluster-mynode myip:443 check
Steps to Reproduce:
Have two Debian 10 nodes with docker-ce, one with haproxy, the other as RKE-node.
Run rke up
Edit the generated kubeconfig's server URL to point to haproxy load balancer through the SAN instead of control plane IP.
Try to do kubectl get nodes
Results:
Unable to connect to the server: x509: certificate is valid for ingress.local, not mycluster.mydomain.mytld
Causes that have ocurred to me:
I typo'd the SAN in the cluster.yml: Unlikely, I have tried this tens of times, each time having the same outcome, each time making sure I get it right.
The haproxy loadbalancer is misconfigured: Possible, but the error message indicates there's something wrong with the certificate, not the connection to the kubeapiserver, but perhaps the two are related. I was hoping to test (and debug) the load balancing by changing the resulting kubeconfig from rke up
and pointing it to the load balancer, hence I am at this point. To make sure it's not this, I will validate whether the connection works, but I have already posted this issue because I suspect this is not the cause.
I have neglected some detail that is generic to k8s, and not RKE specifically: Likely, since I am not an expert on either k8s or RKE.
I hope this issue doesn't get burried, and would be grateful for any attention this issue gets.
ingress.local
is hitting the ingress controller on the node, and returns the default certificate if no host is matched. So the ACL is not matched, this should be easily eliminated by checking if it works if you simplify the config and default to kubeapi to see if that works and work from there. The frontend mode in TCP and checking host headers doesn't add up to me, you normally need http to check host headers. But by simplifying your config you should be able to debug better.
Hi @ohader @karthicksndr,
I have diverged significantly, since I have elected to use DNS-based spreading of traffic. My motivation for distributing was to not have a single point of failure, instead of going at it purely for performance-reasons.
Notice the ugliness in terms of repetition, and so without any guarantee, this was my last version of the haproxy.cfg (without the global
section) and rke-config. I can't remember if it actually worked, but have a whole Ansible playbook that sets everything up, which I can share after a bit of cleaning up. However, it would at least initially make certain assumptions regarding OS and DNS provider. So let let me know if the code below either doesn't work or you'd like a ready-to-go Ansible-role for setting all up.
haproxy.cfg
frontend main
bind *:6443
bind *:80
bind *:443
mode tcp
option tcplog
acl is_worker80 dst_port 80
acl is_worker443 dst_port 443
use_backend worker80 if is_worker80
use_backend worker443 if is_worker443
default_backend master
backend master
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server master.node.cluster.region.mydomain.tld myip:6443 check
backend worker443
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server worker.node.cluster.region.mydomain.tld myip:443 check
backend worker80
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server worker.node.cluster.region.mydomain.tld myip:80 check
rke config.yaml
nodes:
- address: my_ip
user: root
role:
- controlplane
- worker
- etcd
cluster_name: my_cluster
# rke config --list-version --all
kubernetes_version: 1.19
# https://rancher.com/docs/rke/latest/en/config-options/authentication/
authentication:
strategy: x509
sans:
- "master.node.cluster.region.mydomain.tld myip"
authorization:
mode: rbac
# https://rancher.com/docs/rke/latest/en/config-options/add-ons/network-plugins/
network:
plugin: canal
# https://rancher.com/docs/rke/latest/en/config-options/add-ons/dns/
provider: coredns
# Currently only nginx ingress provider is supported.
# To disable ingress controller, set `provider: none`
# `node_selector` controls ingress placement and is optional
ingress:
provider: nginx
options:
use-forwarded-headers: 'true'
services:
kube-api:
secrets_encryption_config:
enabled: true
# For Rook
kubelet:
extra_args:
volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
extra_binds:
- /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec