Kubernetes DNS Service Discovery limitations
Kubernetes Service objects expose applications running in Pods as network services. For each combination of service name, port and associated Pod, the Kubernetes DNS system creates a DNS SRV
record that can be used for service discovery.
In this post I demonstrate a deficiency in this system that obstructs important, real-world use cases, and sketch potential solutions.
Overview of Kubernetes Services and DNS §
The following Service definition defines an LDAP service:
$ oc create -f service-test.yaml
apiVersion: v1
kind: Service
metadata:
name: service-test
labels:
app: service-test
spec:
selector:
app: service-test
clusterIP: None
ports:
- name: ldap
protocol: TCP
port: 389
$ oc create -f service-test.yaml
service/service-test created
The Service controller creates Endpoint objects to associating each of the Service ports
with each Pod matching the Service selector
. If there are no matching pods, there are no endpoints:
$ oc get endpoints service-test
NAME ENDPOINTS AGE
service-test <none> 8m1s
If we add a matching pod:
$ cat pod-service-test.yaml
apiVersion: v1
kind: Pod
metadata:
name: service-test
labels:
app: service-test
spec:
containers:
- name: service-test
image: freeipa/freeipa-server:fedora-31
command: ["sleep", "3601"]
$ oc create -f pod-service-test.yaml
pod/service-test created
Then the Service controller creates an endpoint that maps the Service to the Pod:
$ oc get endpoints service-test
NAME ENDPOINTS AGE
service-test 10.129.2.13:389 16m
$ oc get -o yaml endpoints service-test
apiVersion: v1
kind: Endpoints
metadata:
labels:
app: service-test
service.kubernetes.io/headless: ""
...
subsets:
- addresses:
- ip: 10.129.2.13
nodeName: ft-47dev-2-27h8r-worker-0-f8bnl
targetRef:
kind: Pod
name: service-test
namespace: test
resourceVersion: "4556709"
uid: 296030f5-8dff-4f69-be96-ce6f0aa12653
ports:
- name: ldap
port: 389
protocol: TCP
Cluster DNS systems (there are different implementations, e.g. kubedns, and the OpenShift Cluster DNS Operator) use the Endpoints objects to manage DNS records for applications running in the cluster. In particular, it creates SRV
records mapping each service name
and protocol
combination to the pod(s) that provide that service. The behaviour is defined in the Kubernetes DNS-Based Service Discovery specification.
The SRV record owner name has the form:
_<port>._<proto>.<service>.<ns>.svc.<zone>.
where ns
is the project namespace and zone
is the cluster DNS zone. The objects created above result in the follow SRV
and A
records:
$ oc rsh service-test
sh-5.0# dig +short SRV \
_ldap._tcp.service-test.test.svc.cluster.local
0 100 389 10-129-2-13.service-test.test.svc.cluster.local.
sh-5.0# dig +short A \
10-129-2-13.service-test.test.svc.cluster.local
10.129.2.13
For more information above DNS SRV
records, see RFC 2782.
Kubernetes SRV limitation §
Some services operate over TCP, some over UDP. And some operate over both TCP and UDP. Two examples are DNS and Kerberos. SRV
records are of particular importance for Kerberos; they are used (widely, by multiple implementations) for KDC discovery.
So to host a Kerberos KDC in Kubernetes and enable service discovery, we need two sets of SRV records: _kerberos._tcp
and _kerberos._udp
. And likewise for the kpasswd
and kerberos-master
service names. There could be (probably are) other protocols where a similar arrangement is required.
So, let’s update the Service object and add the kerberos
ServicePort specs:
$ cat service-test.yaml
apiVersion: v1
kind: Service
metadata:
name: service-test
labels:
app: service-test
spec:
selector:
app: service-test
clusterIP: None
ports:
- name: ldap
protocol: TCP
port: 389
- name: kerberos
protocol: TCP
port: 88
- name: kerberos
protocol: UDP
port: 88
$ oc replace -f service-test.yaml
The Service "service-test" is invalid:
spec.ports[2].name: Duplicate value: "kerberos"
Well, that’s a shame. Kerberos does not admit this important use case.
Endpoints do not have the limitation §
Interestingly, the Endpoints type does not have this limitation. The Service controller automatically creates Endpoints objects for Services. The ServicePorts are (as far as I can tell) copied across to the Endpoints object.
I can manually replace the endpoints/service-test
object (see above) with the following spec that includes the “duplicate” kerberos
port:
$ cat endpoints.yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: "2020-12-07T03:51:30Z"
labels:
app: service-test
service.kubernetes.io/headless: ""
name: service-test
subsets:
- addresses:
- ip: 10.129.2.13
nodeName: ft-47dev-2-27h8r-worker-0-f8bnl
targetRef:
kind: Pod
name: service-test
namespace: test
resourceVersion: "5522680"
uid: 296030f5-8dff-4f69-be96-ce6f0aa12653
ports:
- name: ldap
port: 389
protocol: TCP
- name: kerberos
port: 88
protocol: TCP
- name: kerberos
port: 88
protocol: UDP
$ oc replace -f endpoints.yaml
endpoints/service-test replaced
The object was accepted! Observe that the DNS system responds and creates both the _kerberos._tcp
and _kerberos._udp
SRV
records:
$ oc rsh service-test
sh-5.0# dig +short SRV \
_kerberos._tcp.service-test.test.svc.cluster.local
0 100 88 10-129-2-13.service-test.test.svc.cluster.local.
sh-5.0# dig +short SRV \
_kerberos._udp.service-test.test.svc.cluster.local
0 100 88 10-129-2-13.service-test.test.svc.cluster.local.
Therefore it seems the scope of this problem is limited to validation and processing of the Service
object. Other components of Kubernetes (Endpoint validation and the Cluster DNS Operator, at least) can already handle this use case.
Possible resolutions §
Besides manually fiddling with the Endpoints (eww) I am not aware of any workarounds, but I see two possible approaches to resolving this issue.
One approach is to relax the uniqueness check. Instead of checking for uniqueness of ServicePort name
, check for the uniqueness of the name
/protocol
pair. This is conceptually simple but I am not familiar enough with Kubernetes internals to judge the feasibility or technical tradeoffs of this approach. For users, nothing changes (except the example above would work!)
Another approach is to add a new ServicePort field to specify the actual DNS service label to use. For the sake of discussion I’ll call it serviceName
. It would be optional, defaulting to the value of name
. This means name
can still be the “primary key”, but the approach requires another uniqueness check on the serviceName
/protocol
pair. In our use case the configuration would look like:
...
ports:
- name: ldap
protocol: TCP
port: 389
- name: kerberos-tcp
serviceName: kerberos
protocol: TCP
port: 88
- name: kerberos-udp
serviceName: kerberos
protocol: UDP
port: 88
From a UX perspective I prefer the first approach, because there are no changes or additions to the ServicePort configuration schema. But to maintain compatibility with programs that assume that name
is unique (as is currently enforced), it might be necessary to introduce a new field.
Next steps §
I filed a bug report and submitted a proof-of-concept pull request to bring attention to the problem and solicit feedback from Kubernetes and OpenShift DNS experts. It might be necessary to submit a Kubernetes Enhancement Proposal (KEP), but that seems (as a Kubernetes outsider) a long and windy road to landing what is a conceptually small change.