Kubernetes DNS Service Discovery limitations
Kubernetes Service objects expose applications running in Pods as
network services. For each combination of service name, port and
associated Pod, the Kubernetes DNS system creates a DNS SRV
record that can be used for service discovery.
In this post I demonstrate a deficiency in this system that obstructs important, real-world use cases, and sketch potential solutions.
Overview of Kubernetes Services and DNS §
The following Service definition defines an LDAP service:
$ oc create -f service-test.yaml
apiVersion: v1
kind: Service
metadata:
name: service-test
labels:
app: service-test
spec:
selector:
app: service-test
clusterIP: None
ports:
- name: ldap
protocol: TCP
port: 389
$ oc create -f service-test.yaml
service/service-test created
The Service controller creates Endpoint objects to associating
each of the Service ports
with each Pod matching the Service
selector
. If there are no matching pods, there are no
endpoints:
$ oc get endpoints service-test
NAME ENDPOINTS AGE
service-test <none> 8m1s
If we add a matching pod:
$ cat pod-service-test.yaml
apiVersion: v1
kind: Pod
metadata:
name: service-test
labels:
app: service-test
spec:
containers:
- name: service-test
image: freeipa/freeipa-server:fedora-31
command: ["sleep", "3601"]
$ oc create -f pod-service-test.yaml
pod/service-test created
Then the Service controller creates an endpoint that maps the Service to the Pod:
$ oc get endpoints service-test
NAME ENDPOINTS AGE
service-test 10.129.2.13:389 16m
$ oc get -o yaml endpoints service-test
apiVersion: v1
kind: Endpoints
metadata:
labels:
app: service-test
service.kubernetes.io/headless: ""
...
subsets:
- addresses:
- ip: 10.129.2.13
nodeName: ft-47dev-2-27h8r-worker-0-f8bnl
targetRef:
kind: Pod
name: service-test
namespace: test
resourceVersion: "4556709"
uid: 296030f5-8dff-4f69-be96-ce6f0aa12653
ports:
- name: ldap
port: 389
protocol: TCP
Cluster DNS systems (there are different implementations, e.g.
kubedns, and the OpenShift Cluster DNS Operator) use the
Endpoints objects to manage DNS records for applications running in
the cluster. In particular, it creates SRV
records mapping each
service name
and protocol
combination to the pod(s) that
provide that service. The behaviour is defined in the Kubernetes
DNS-Based Service Discovery specification.
The SRV record owner name has the form:
_<port>._<proto>.<service>.<ns>.svc.<zone>.
where ns
is the project namespace and zone
is the cluster
DNS zone. The objects created above result in the follow SRV
and A
records:
$ oc rsh service-test
sh-5.0# dig +short SRV \
_ldap._tcp.service-test.test.svc.cluster.local
0 100 389 10-129-2-13.service-test.test.svc.cluster.local.
sh-5.0# dig +short A \
10-129-2-13.service-test.test.svc.cluster.local
10.129.2.13
For more information above DNS SRV
records, see RFC 2782.
Kubernetes SRV limitation §
Some services operate over TCP, some over UDP. And some operate
over both TCP and UDP. Examples include DNS, Kerberos and SIP.
SRV
records are of particular importance for Kerberos; they are
used (widely, by multiple implementations) for KDC discovery.
So to host a Kerberos KDC in Kubernetes and enable service
discovery, we need two sets of SRV records: _kerberos._tcp
and
_kerberos._udp
. And likewise for the kpasswd
and
kerberos-master
service names. There could be (probably are)
other protocols where a similar arrangement is required.
So, let’s update the Service object and add the kerberos
ServicePort specs:
$ cat service-test.yaml
apiVersion: v1
kind: Service
metadata:
name: service-test
labels:
app: service-test
spec:
selector:
app: service-test
clusterIP: None
ports:
- name: ldap
protocol: TCP
port: 389
- name: kerberos
protocol: TCP
port: 88
- name: kerberos
protocol: UDP
port: 88
$ oc replace -f service-test.yaml
The Service "service-test" is invalid:
spec.ports[2].name: Duplicate value: "kerberos"
Well, that’s a shame. Kubernetes does not support this important use case.
Endpoints do not have the limitation §
Interestingly, the Endpoints type does not have this limitation. The Service controller automatically creates Endpoints objects for Services. The ServicePorts are (as far as I can tell) copied across to the Endpoints object.
I can manually replace the endpoints/service-test
object (see
above) with the following spec that includes the “duplicate”
kerberos
port:
$ cat endpoints.yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: "2020-12-07T03:51:30Z"
labels:
app: service-test
service.kubernetes.io/headless: ""
name: service-test
subsets:
- addresses:
- ip: 10.129.2.13
nodeName: ft-47dev-2-27h8r-worker-0-f8bnl
targetRef:
kind: Pod
name: service-test
namespace: test
resourceVersion: "5522680"
uid: 296030f5-8dff-4f69-be96-ce6f0aa12653
ports:
- name: ldap
port: 389
protocol: TCP
- name: kerberos
port: 88
protocol: TCP
- name: kerberos
port: 88
protocol: UDP
$ oc replace -f endpoints.yaml
endpoints/service-test replaced
The object was accepted! Observe that the DNS system responds and
creates both the _kerberos._tcp
and _kerberos._udp
SRV
records:
$ oc rsh service-test
sh-5.0# dig +short SRV \
_kerberos._tcp.service-test.test.svc.cluster.local
0 100 88 10-129-2-13.service-test.test.svc.cluster.local.
sh-5.0# dig +short SRV \
_kerberos._udp.service-test.test.svc.cluster.local
0 100 88 10-129-2-13.service-test.test.svc.cluster.local.
Therefore it seems the scope of this problem is limited to
validation and processing of the Service
object. Other
components of Kubernetes (Endpoint validation and the Cluster DNS
Operator, at least) can already handle this use case.
Possible resolutions §
Besides manually fiddling with the Endpoints (eww) I am not aware of any workarounds, but I see two possible approaches to resolving this issue.
One approach is to relax the uniqueness check. Instead of checking
for uniqueness of ServicePort name
, check for the uniqueness of
the name
/protocol
pair. This is conceptually simple but I
am not familiar enough with Kubernetes internals to judge the
feasibility or technical tradeoffs of this approach. For users,
nothing changes (except the example above would work!)
Another approach is to add a new ServicePort field to specify the
actual DNS service label to use. For the sake of discussion I’ll
call it serviceName
. It would be optional, defaulting to the
value of name
. This means name
can still be the “primary
key”, but the approach requires another uniqueness check on the
serviceName
/protocol
pair. In our use case the
configuration would look like:
...
ports:
- name: ldap
protocol: TCP
port: 389
- name: kerberos-tcp
serviceName: kerberos
protocol: TCP
port: 88
- name: kerberos-udp
serviceName: kerberos
protocol: UDP
port: 88
From a UX perspective I prefer the first approach, because there are
no changes or additions to the ServicePort configuration schema.
But to maintain compatibility with programs that assume that
name
is unique (as is currently enforced), it might be necessary
to introduce a new field.
Next steps §
I filed a bug report and submitted a proof-of-concept pull request to bring attention to the problem and solicit feedback from Kubernetes and OpenShift DNS experts. It might be necessary to submit a Kubernetes Enhancement Proposal (KEP), but that seems (as a Kubernetes outsider) a long and windy road to landing what is a conceptually small change.