Experimenting with ExternalDNS
DNS is a critical piece of the puzzle for exposing Kubernetes-hosted applications to the Internet. Running the application means nothing if you can’t get traffic to it. Keeping public DNS records in sync with the deployed applications is important. The Kubernetes ExternalDNS was developed for this purpose.
ExternalDNS exposes Kubernetes Services and Routes in by managing records in external DNS providers. It supports many DNS providers, including the DNS services of the popular cloud providers (AWS, Google Cloud, Azure, …).
I have been experimenting with ExternalDNS. My purpose is not only
to understand installation and basic usage, but also whether it can
meet the specific DNS requirements of FreeIPA, such as SRV
records. This post outlines my findings.
Operator installation §
The ExternalDNS controller is a Kubernetes sub-project (or
SIG—special interest group). In the OpenShift ecosystem, the
ExternalDNS Operator creates and manages ExternalDNS controller
instances defined by custom resources (CRs) of kind: ExternalDNS.
The ExternalDNS Operator is available as a Tech Preview in OpenShift Container Platform 4.10. So, it is visible in the OperatorHub catalogue out-of-the-box. The official docs explain how to install the operator via the OperatorHub web console. The instructions were easy to follow.
I prefer using the CLI where possible. The OperatorHub system is complex but I eventually worked out what commands and objects are needed to install the ExternalDNS Operator from the CLI.
First, create the operand namespaces and RBAC objects. The operand namespace is where the ExternalDNS controllers (as opposed to the ExternalDNS Operator controller) will live.
$ oc create ns external-dns
namespace/external-dns created
$ oc apply -f \
https://raw.githubusercontent.com/openshift/external-dns-operator/release-0.1/config/rbac/extra-roles.yaml
role.rbac.authorization.k8s.io/external-dns-operator created
rolebinding.rbac.authorization.k8s.io/external-dns-operator created
clusterrole.rbac.authorization.k8s.io/external-dns created
clusterrolebinding.rbac.authorization.k8s.io/external-dns created
Next, create the external-dns-operator namespace where the
operator itself shall live:
% oc create ns external-dns-operator
namespace/external-dns-operator created
Finally create the OperatorGroup and OperatorHub Subscription
objects. Note the contents of external-dns-operator.yaml:
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
generateName: external-dns-operator-
namespace: external-dns-operator
spec:
targetNamespaces:
- external-dns-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: external-dns-operator
namespace: external-dns-operator
spec:
name: external-dns-operator
source: redhat-operators
sourceNamespace: openshift-marketplaceCreate the objects:
% oc create -f external-dns-operator.yaml
operatorgroup.operators.coreos.com/external-dns-operator-8852w created
subscription.operators.coreos.com/external-dns-operator created
After a short delay (~1 minute for me) the operator installation should finish. Observe the various Kubernetes objects that represent the running operator:
% oc get -n external-dns-operator all
NAME READY STATUS RESTARTS AGE
pod/external-dns-operator-594b465984-r2pc5 2/2 Running 2 (59s ago) 5m13s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/external-dns-operator-metrics-service ClusterIP 172.30.151.142 <none> 8443/TCP 5m15s
service/external-dns-operator-service ClusterIP 172.30.210.21 <none> 9443/TCP 59s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/external-dns-operator 1/1 1 1 5m14s
NAME DESIRED CURRENT READY AGE
replicaset.apps/external-dns-operator-594b465984 1 1 1 5m15s
The ExternalDNS custom resource §
Now that the operator is installed, we can define an ExternalDNS
customer resource (CR). The operator creates an ExternalDNS
controller instance for each CR. Here is an example
(externaldns-test.yaml):
apiVersion: externaldns.olm.openshift.io/v1alpha1
kind: ExternalDNS
metadata:
name: test
spec:
domains:
- filterType: Include
matchType: Exact
name: ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com
provider:
type: GCP
source:
type: Service
service:
serviceType:
- LoadBalancer
labelFilter:
matchLabels:
app: echo
fqdnTemplate:
- "{{.Name}}.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com"Breaking down the spec, we see the following fields:
domainsgives a rule for which domains thisExternalDNScontroller must manage. In this case, any domain name with a suffix matching thenamesubfield will match the rule.providerspecifies the cloud provider—in this case GCP (Google Cloud). For GCP there is nothing else to configure; the controller will use the main cluster secret to authenticate to Google Cloud.sourcespecifies which kinds of objects the controller will monitor to determine the DNS records to be created/managed. We configure the controller to watch Service objects. Further configuration is specified in subfields:serviceTyperestricts the type(s) of Service objects to be considered.labelFiltercan be set to further restrict the set of source objects by matching on thelabelfield. In this example, we only match Service objects with labelapp: echo.fqdnTemplatespecifies how to derive the fully qualified DNS name from the Service object.hostnameAnnotationcan be set toAllowto allow the FQDN to be specified via theexternal-dns.alpha.kubernetes.io/hostnameannotation on the Service object. The default value isIgnore, in which casefqdnTemplateis required.
Aside from type: Service, the ExternalDNS CR also recognises
type: OpenShiftRoute. This type uses Route objects as the
source, creating CNAME records to alias the FQDN derived from the
Route object to the canonical DNS name of the ingress controller.
This isn’t the behaviour I’m looking for, so the rest of this
article focuses on the behaviour for Service sources.
Creating the ExternalDNS controller §
Now that we have defined an ExternalDNS custom resource, let’s
create it and see what happens. I would like to watch the logs of
the ExternalDNS Operator during this operation.
Earlier we saw that the name of the operator Pod is
pod/external-dns-operator-594b465984-r2pc5. This Pod has two
containers:
% oc get -o json -n external-dns-operator \
pod/external-dns-operator-594b465984-r2pc5 \
| jq '.status.containerStatuses[].name'
"kube-rbac-proxy"
"operator"
The container named operator is the one we are interested in.
We can watch its log output like so:
% oc logs -n external-dns-operator --tail 2 --follow \
external-dns-operator-594b465984-r2pc5 operator
2022-03-22T04:41:06.625Z INFO controller-runtime.manager.controller.external_dns_controller Starting workers {"worker count": 1}
2022-03-22T04:41:06.626Z INFO controller-runtime.manager.controller.credentials_secret_controller Starting workers {"worker count": 1}
... (waiting for more output)
Now, in another terminal, create the ExternalDNS CR object:
% oc create -f externaldns-test.yaml
externaldns.externaldns.olm.openshift.io/test created
Log output shows the ExternalDNS Operator responding to the
appearance of the externaldns/test CR:
controller-runtime.webhook.webhooks received request {"webhook": "/validate-externaldns-olm-openshift-io-v1alpha1-externaldns", "UID": "cf2fb876-9ddd-45a8-88b8-5cc0344fb5cc", "kind": "externaldns.olm.openshift.io/v1alpha1, Kind=ExternalDNS", "resource": {"group":"externaldns.olm.openshift.io","version":"v1alpha1","resource":"externaldnses"}}
validating-webhook validate create {"name": "test"}
controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-externaldns-olm-openshift-io-v1alpha1-externaldns", "code": 200, "reason": "", "UID": "cf2fb876-9ddd-45a8-88b8-5cc0344fb5cc", "allowed": true}
external_dns_controller reconciling externalDNS {"externaldns": "/test"}
…
And if we look in the operand namespace (external-dns) we see
a Pod running:
% oc get -n external-dns pod
NAME READY STATUS RESTARTS AGE
external-dns-test-865ffff756-45d44 1/1 Running 0 54s
And if you want to see what an ExternalDNS controller is up to, you can watch its logs:
% oc logs -n external-dns --tail 1 --follow \
pod/external-dns-test-865ffff756-45d44
time="2022-03-23T12:26:18Z" level=info msg="All records are already up to date"
... (waiting for more output)
Observing record creation §
After creating the ExternalDNS instance, I found Google Cloud DNS zone for my cluster and queried its records. How to interact with the cloud provider depends on which cloud provider the cluster is hosted on, so I won’t provide details. The existing records are:
ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
NS 21600 ns-gcp-private.googledomains.com.
ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
SOA 21600 ns-gcp-private.googledomains.com.
api.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
A 60 10.0.0.2
api-int.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
A 60 10.0.0.2
*.apps.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
A 30 35.223.148.37
This is a private zone specific to my cluster. Some non-routable addresses appear. I haven’t figured out how to update the records in the public zone yet. I’m confident this is not a problem with ExternalDNS. Rather, I put it down to my lack of familiarity with how to configure it, and with Google Cloud DNS.
We can see that in addition to the expected NS and SOA records,
there are A records for the API server and a wildcard A record
for the main ingress controller.
Next I create the following Service:
apiVersion: v1
kind: Service
metadata:
name: echo-tcp
labels:
app: echo
spec:
type: LoadBalancer
selector:
app: echo
ports:
- name: tcpecho
protocol: TCP
port: 12345Note that it has the app: echo label and has type: LoadBalancer,
satisfying the match criteria of the externaldns/test controller.
Create the service and observe its public IP address:
% oc create -f service-echo.yaml
service/echo-tcp created
% oc get service/echo-tcp \
-o jsonpath='{.status.loadBalancer}'
{"ingress":[{"ip":"35.188.22.139"}]}
After creating the Service, two new records appeared in the zone:
echo-tcp.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
A 300 35.188.22.139
external-dns-echo-tcp.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com.
TXT 300 "heritage=external-dns,external-dns/owner=external-dns-test,external-dns/resource=service/test/echo-tcp"
The A record resolves the DNS name to the load balancer’s IP
address. Nothing surprising here.
The TXT record is the for the name external-dns-echo-tcp.… and
contains some metadata about the “owner” of the corresponding A
record. Specifically, it identifies the Service object that is the
source of the record. I am not 100% sure, but it seems to also
contain information about the ExternalDNS controller that created
the record.
When I first saw the TXT records, I theorised that the ExternalDNS
controller uses the TXT records to find “obsolete” records and
delete them. This would occur, for example, when the Service is
deleted. Indeed, deleting service/echo-tcp resulted in the
removal of both the A and TXT records.
SRV records for LoadBalancer Services §
Kubernetes’ internal DNS system follows a DNS-based service
discovery specification. In addition to A/AAAA
records, SRV records are created to locate service endpoints (port
and target DNS name) based on service name and transport protocol
(TCP or UDP). SRV records are an important part of several
protocols as used in the real world, including Kerberos, SIP, LDAP
and XMPP. SRV records have the following shape:
_<service>._<proto>.<domain> <ttl>
<class> SRV <priority> <weight> <port> <target>
A record to locate an organisation’s LDAP server might look like:
_ldap._tcp.example.net 300
IN SRV 10 5 389 ldap.corp.example.net
Although the current system has a critical deficiency for applications that use SRV records and operate on both TCP and UDP (see my previous blog post) for most applications it works well. Unfortunately, ExternalDNS does not follow the DNS spec and does not create SRV records for Services.
I am not sure why this is the case. Perhaps ExternalDNS even pre-dates the SRV aspects of the Kubernetes DNS specification. Or the need might not have been recognised or deemed sufficiently critical to address this gap.
As it happens, there is an abandoned pull request from two years ago that sought to add SRV record generation to ExternalDNS and bring it in line with the spec. The maintainers seemed receptive, but the PR author no longer needed the feature and closed it. So I think there is reason to hope that the feature might eventually make it into ExternalDNS. Perhaps our team will drive it… we need SRV records, and it would probably be better to enhance ExternalDNS than to build our own solution from scratch.
SRV records for NodePort services §
I said that ExternalDNS does not support SRV records, but there is
one exception to that. ExternalDNS does create SRV records for
Services of type: NodePort. This is not an appropriate solution
for our application, but we can still play with it and get a feel
for how it might work similarly for LoadBalancer Services.
First, we have to modify externaldns/test to add NodePort to the
list of Service types. Update externaldns-test.yaml:
…
service:
serviceType:
- LoadBalancer
- NodePort
…And apply updated configuration:
% oc replace -f externaldns-test.yaml
externaldns.externaldns.olm.openshift.io/test replaced
Now create a new NodePort Service. service-nodeport.yaml:
apiVersion: v1
kind: Service
metadata:
name: nodeport
labels:
app: echo
spec:
type: NodePort
selector:
app: echo
ports:
- name: nodeport
protocol: TCP
port: 12345% oc create -f service-nodeport.yaml
service/nodeport created
The ExternalDNS controller log output shows it generating an SRV
record for the Service (wrapped for clarity):
…
time="…" level=debug msg="Endpoints generated from service:
default/nodeport:
[ _nodeport._tcp.nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com 0
IN SRV 0 50 30632
nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com []
nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com 0
IN A 10.0.0.4;10.0.0.5;10.0.128.3;10.0.128.2;10.0.128.4;10.0.0.3 []
]"
…
Unfortunately, the SRV record didn’t actually make it to the
Google Cloud DNS zone. I haven’t worked out why, yet. The A
record does get created; it’s only the SRV record that is missing.
I’ll update this article if/when I work out why the SRV record
goes.
Conclusion §
The ExternalDNS system is intended to automatically manage public
DNS records for Kubernetes-hosted applications. It can
automatically create CNAME records for OpenShift Routes and
A/AAAA records for Services, including LoadBalancer services.
For applications that use A/AAAA and CNAME records, it works
well.
Unfortunately, SRV records are not well supported. Certainly, it
does not meet the needs of typical applications that use SRV
records. Operators of such applications currently have one of two
options: either manage the records manually (do not want), or
implement the required automation yourselves (e.g. in the
application’s operator program).
The best way forward is to implement better support for SRV
records in ExternalDNS itself, so everyone can benefit through
shared effort and maintainership vested in the Kubernetes SIG. I
shall file a ticket and perhaps restart discussions in the
abandoned pull request with a view to getting this
critical feature on the ExternalDNS roadmap. The extent of
involvement of myself or my team in implementing or driving this
feature work will be determined later.