2 of 3 Masters down, my cluster is down too

Hello,
Due to recent events regarding OVH’s datacenter, I’ve lost 2 out of 3 masters of my K8S cluster.

I am tryging to add two new masters to replace them, but my “scale Masters” button is grey and I cannot add them.

Furthermore, my cluster is down because the last master is not used by the workers as a recovery master.

On my master I am getting the following logs:
[2021-03-10 14:09:28] Waiting for healthy etcd cluster.
[2021-03-10 14:09:43] Error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://127.0.0.1:4001 exceeded header timeout
[2021-03-10 14:09:43] ; error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout
[2021-03-10 14:09:43]
[2021-03-10 14:09:43] error #0: client: endpoint http://127.0.0.1:4001 exceeded header timeout
[2021-03-10 14:09:43] error #1: client: endpoint http://127.0.0.1:2379 exceeded header timeout

And on my workers:
[2021-03-10 14:06:35] I0310 14:06:35.836103 12134 cached_discovery.go:121] skipped caching discovery info due to Get https://{IP-of-one-of-the-dead-masters}:443/api?timeout=32s: dial tcp {IP-of-one-of-the-dead-masters}:443: i/o timeout
[2021-03-10 14:06:35] I0310 14:06:35.836162 12134 helpers.go:234] Connection error: Get https://{IP-of-one-of-the-dead-masters}:443/api?timeout=32s: dial tcp {IP-of-one-of-the-dead-masters}:443: i/o timeout
[2021-03-10 14:06:35] F0310 14:06:35.836187 12134 helpers.go:115] Unable to connect to the server: dial tcp {IP-of-one-of-the-dead-masters}:443: i/o timeout
[2021-03-10 14:06:35] Waiting for “bin/kubectl -v=8 --kubeconfig=/etc/pf9/kube.d/kubeconfigs/admin.yaml --context=default-context label node {IP-of-one-of-the-dead-masters} --overwrite node-role.kubernetes.io/worker=” to evaluate to true …
[2021-03-10 14:06:55] I0310 14:06:55.882650 13321 loader.go:375] Config loaded from file: /etc/pf9/kube.d/kubeconfigs/admin.yaml
[2021-03-10 14:06:55] I0310 14:06:55.884121 13321 round_trippers.go:420] GET https://{IP-of-one-of-the-dead-masters}:443/api?timeout=32s
[2021-03-10 14:06:55] I0310 14:06:55.884134 13321 round_trippers.go:427] Request Headers:
[2021-03-10 14:06:55] I0310 14:06:55.884143 13321 round_trippers.go:431] Accept: application/json, /
[2021-03-10 14:06:55] I0310 14:06:55.884151 13321 round_trippers.go:431] User-Agent: kubectl/v1.18.10 (linux/amd64) kubernetes/xxxx
[2021-03-10 14:07:25] I0310 14:07:25.884498 13321 round_trippers.go:446] Response Status: in 30000 milliseconds

Furthermore, when I launch pf9ctl config validate on the last master, I am getting:
Validation of config:[xxxx.platform9.io] Failed to: https://xxxx.platform9.io

If someone can help there, I have no solutions to have my cluster up again mainly because I cannot add another master to replace the two that are dead.

I am certain that I will not be able to restart the older masters, because it’s likely that they have been destroyed by the fire at OVH…

Thanks !

Hey, I know you spoke to Matthew on our Slack channel. Just to complete the loop in case anyone references this post, the discussion is available here.