Experimenting with ExternalDNS

DNS is a critical piece of the puzzle for exposing Kubernetes-hosted applications to the Internet. Running the application means nothing if you can’t get traffic to it. Keeping public DNS records in sync with the deployed applications is important. The Kubernetes ExternalDNS was developed for this purpose.

ExternalDNS exposes Kubernetes Services and Routes in by managing records in external DNS providers. It supports many DNS providers, including the DNS services of the popular cloud providers (AWS, Google Cloud, Azure, …).

I have been experimenting with ExternalDNS. My purpose is not only to understand installation and basic usage, but also whether it can meet the specific DNS requirements of FreeIPA, such as SRV records. This post outlines my findings.

Operator installation §

The ExternalDNS controller is a Kubernetes sub-project (or SIG—special interest group). In the OpenShift ecosystem, the ExternalDNS Operator creates and manages ExternalDNS controller instances defined by custom resources (CRs) of kind: ExternalDNS.

The ExternalDNS Operator is available as a Tech Preview in OpenShift Container Platform 4.10. So, it is visible in the OperatorHub catalogue out-of-the-box. The official docs explain how to install the operator via the OperatorHub web console. The instructions were easy to follow.

I prefer using the CLI where possible. The OperatorHub system is complex but I eventually worked out what commands and objects are needed to install the ExternalDNS Operator from the CLI.

First, create the operand namespaces and RBAC objects. The operand namespace is where the ExternalDNS controllers (as opposed to the ExternalDNS Operator controller) will live.

$ oc create ns external-dns
namespace/external-dns created

$ oc apply -f \
role.rbac.authorization.k8s.io/external-dns-operator created
rolebinding.rbac.authorization.k8s.io/external-dns-operator created
clusterrole.rbac.authorization.k8s.io/external-dns created
clusterrolebinding.rbac.authorization.k8s.io/external-dns created

Next, create the external-dns-operator namespace where the operator itself shall live:

% oc create ns external-dns-operator
namespace/external-dns-operator created

Finally create the OperatorGroup and OperatorHub Subscription objects. Note the contents of external-dns-operator.yaml:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
  generateName: external-dns-operator-
  namespace: external-dns-operator
  - external-dns-operator
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
  name: external-dns-operator
  namespace: external-dns-operator
  name: external-dns-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Create the objects:

% oc create -f external-dns-operator.yaml
operatorgroup.operators.coreos.com/external-dns-operator-8852w created
subscription.operators.coreos.com/external-dns-operator created

After a short delay (~1 minute for me) the operator installation should finish. Observe the various Kubernetes objects that represent the running operator:

% oc get -n external-dns-operator all
NAME                                         READY   STATUS    RESTARTS      AGE
pod/external-dns-operator-594b465984-r2pc5   2/2     Running   2 (59s ago)   5m13s

NAME                                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/external-dns-operator-metrics-service   ClusterIP   <none>        8443/TCP   5m15s
service/external-dns-operator-service           ClusterIP    <none>        9443/TCP   59s

NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/external-dns-operator   1/1     1            1           5m14s

NAME                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/external-dns-operator-594b465984   1         1         1       5m15s

The ExternalDNS custom resource §

Now that the operator is installed, we can define an ExternalDNS customer resource (CR). The operator creates an ExternalDNS controller instance for each CR. Here is an example (externaldns-test.yaml):

apiVersion: externaldns.olm.openshift.io/v1alpha1
kind: ExternalDNS
  name: test
    - filterType: Include 
      matchType: Exact 
      name: ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com
    type: GCP
    type: Service
        - LoadBalancer
        app: echo
      - "{{.Name}}.ci-ln-053y10k-72292.origin-ci-int-gce.dev.rhcloud.com"

Breaking down the spec, we see the following fields:

Aside from type: Service, the ExternalDNS CR also recognises type: OpenShiftRoute. This type uses Route objects as the source, creating CNAME records to alias the FQDN derived from the Route object to the canonical DNS name of the ingress controller. This isn’t the behaviour I’m looking for, so the rest of this article focuses on the behaviour for Service sources.

Creating the ExternalDNS controller §

Now that we have defined an ExternalDNS custom resource, let’s create it and see what happens. I would like to watch the logs of the ExternalDNS Operator during this operation.

Earlier we saw that the name of the operator Pod is pod/external-dns-operator-594b465984-r2pc5. This Pod has two containers:

% oc get -o json -n external-dns-operator \
    pod/external-dns-operator-594b465984-r2pc5 \
    | jq '.status.containerStatuses[].name'

The container named operator is the one we are interested in. We can watch its log output like so:

% oc logs -n external-dns-operator --tail 2 --follow \
    external-dns-operator-594b465984-r2pc5 operator
2022-03-22T04:41:06.625Z        INFO    controller-runtime.manager.controller.external_dns_controller   Starting workers        {"worker count": 1}
2022-03-22T04:41:06.626Z        INFO    controller-runtime.manager.controller.credentials_secret_controller     Starting workers        {"worker count": 1}
... (waiting for more output)

Now, in another terminal, create the ExternalDNS CR object:

% oc create -f externaldns-test.yaml
externaldns.externaldns.olm.openshift.io/test created

Log output shows the ExternalDNS Operator responding to the appearance of the externaldns/test CR:

controller-runtime.webhook.webhooks     received request        {"webhook": "/validate-externaldns-olm-openshift-io-v1alpha1-externaldns", "UID": "cf2fb876-9ddd-45a8-88b8-5cc0344fb5cc", "kind": "externaldns.olm.openshift.io/v1alpha1, Kind=ExternalDNS", "resource": {"group":"externaldns.olm.openshift.io","version":"v1alpha1","resource":"externaldnses"}}
validating-webhook      validate create {"name": "test"}
controller-runtime.webhook.webhooks     wrote response  {"webhook": "/validate-externaldns-olm-openshift-io-v1alpha1-externaldns", "code": 200, "reason": "", "UID": "cf2fb876-9ddd-45a8-88b8-5cc0344fb5cc", "allowed": true}
external_dns_controller reconciling externalDNS {"externaldns": "/test"}

And if we look in the operand namespace (external-dns) we see a Pod running:

% oc get -n external-dns pod
NAME                                 READY   STATUS    RESTARTS   AGE
external-dns-test-865ffff756-45d44   1/1     Running   0          54s

And if you want to see what an ExternalDNS controller is up to, you can watch its logs:

% oc logs -n external-dns --tail 1 --follow \
time="2022-03-23T12:26:18Z" level=info msg="All records are already up to date"
... (waiting for more output)

Observing record creation §

After creating the ExternalDNS instance, I found Google Cloud DNS zone for my cluster and queried its records. How to interact with the cloud provider depends on which cloud provider the cluster is hosted on, so I won’t provide details. The existing records are:

  NS    21600  ns-gcp-private.googledomains.com.
  SOA   21600  ns-gcp-private.googledomains.com.
  A     60
  A     60
  A     30

This is a private zone specific to my cluster. Some non-routable addresses appear. I haven’t figured out how to update the records in the public zone yet. I’m confident this is not a problem with ExternalDNS. Rather, I put it down to my lack of familiarity with how to configure it, and with Google Cloud DNS.

We can see that in addition to the expected NS and SOA records, there are A records for the API server and a wildcard A record for the main ingress controller.

Next I create the following Service:

apiVersion: v1
kind: Service
  name: echo-tcp
    app: echo
  type: LoadBalancer
    app: echo
  - name: tcpecho
    protocol: TCP
    port: 12345

Note that it has the app: echo label and has type: LoadBalancer, satisfying the match criteria of the externaldns/test controller. Create the service and observe its public IP address:

% oc create -f service-echo.yaml
service/echo-tcp created

% oc get service/echo-tcp \
    -o jsonpath='{.status.loadBalancer}'

After creating the Service, two new records appeared in the zone:

  A     300
  TXT   300    "heritage=external-dns,external-dns/owner=external-dns-test,external-dns/resource=service/test/echo-tcp"

The A record resolves the DNS name to the load balancer’s IP address. Nothing surprising here.

The TXT record is the for the name external-dns-echo-tcp.… and contains some metadata about the “owner” of the corresponding A record. Specifically, it identifies the Service object that is the source of the record. I am not 100% sure, but it seems to also contain information about the ExternalDNS controller that created the record.

When I first saw the TXT records, I theorised that the ExternalDNS controller uses the TXT records to find “obsolete” records and delete them. This would occur, for example, when the Service is deleted. Indeed, deleting service/echo-tcp resulted in the removal of both the A and TXT records.

SRV records for LoadBalancer Services §

Kubernetes’ internal DNS system follows a DNS-based service discovery specification. In addition to A/AAAA records, SRV records are created to locate service endpoints (port and target DNS name) based on service name and transport protocol (TCP or UDP). SRV records are an important part of several protocols as used in the real world, including Kerberos, SIP, LDAP and XMPP. SRV records have the following shape:

_<service>._<proto>.<domain> <ttl>
    <class> SRV <priority> <weight> <port> <target>

A record to locate an organisation’s LDAP server might look like:

_ldap._tcp.example.net 300
    IN SRV 10 5 389 ldap.corp.example.net

Although the current system has a critical deficiency for applications that use SRV records and operate on both TCP and UDP (see my previous blog post) for most applications it works well. Unfortunately, ExternalDNS does not follow the DNS spec and does not create SRV records for Services.

I am not sure why this is the case. Perhaps ExternalDNS even pre-dates the SRV aspects of the Kubernetes DNS specification. Or the need might not have been recognised or deemed sufficiently critical to address this gap.

As it happens, there is an abandoned pull request from two years ago that sought to add SRV record generation to ExternalDNS and bring it in line with the spec. The maintainers seemed receptive, but the PR author no longer needed the feature and closed it. So I think there is reason to hope that the feature might eventually make it into ExternalDNS. Perhaps our team will drive it… we need SRV records, and it would probably be better to enhance ExternalDNS than to build our own solution from scratch.

SRV records for NodePort services §

I said that ExternalDNS does not support SRV records, but there is one exception to that. ExternalDNS does create SRV records for Services of type: NodePort. This is not an appropriate solution for our application, but we can still play with it and get a feel for how it might work similarly for LoadBalancer Services.

First, we have to modify externaldns/test to add NodePort to the list of Service types. Update externaldns-test.yaml:

        - LoadBalancer
        - NodePort

And apply updated configuration:

% oc replace -f externaldns-test.yaml
externaldns.externaldns.olm.openshift.io/test replaced

Now create a new NodePort Service. service-nodeport.yaml:

apiVersion: v1
kind: Service
  name: nodeport
    app: echo
  type: NodePort
    app: echo
  - name: nodeport
    protocol: TCP
    port: 12345
% oc create -f service-nodeport.yaml
service/nodeport created

The ExternalDNS controller log output shows it generating an SRV record for the Service (wrapped for clarity):

time="…" level=debug msg="Endpoints generated from service:
[ _nodeport._tcp.nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com 0
    IN SRV  0 50 30632
    nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com []
  nodeport.ci-ln-8hkfrzk-72292.origin-ci-int-gce.dev.rhcloud.com 0
    IN A;;;;; []

Unfortunately, the SRV record didn’t actually make it to the Google Cloud DNS zone. I haven’t worked out why, yet. The A record does get created; it’s only the SRV record that is missing. I’ll update this article if/when I work out why the SRV record goes.

Conclusion §

The ExternalDNS system is intended to automatically manage public DNS records for Kubernetes-hosted applications. It can automatically create CNAME records for OpenShift Routes and A/AAAA records for Services, including LoadBalancer services. For applications that use A/AAAA and CNAME records, it works well.

Unfortunately, SRV records are not well supported. Certainly, it does not meet the needs of typical applications that use SRV records. Operators of such applications currently have one of two options: either manage the records manually (do not want), or implement the required automation yourselves (e.g. in the application’s operator program).

The best way forward is to implement better support for SRV records in ExternalDNS itself, so everyone can benefit through shared effort and maintainership vested in the Kubernetes SIG. I shall file a ticket and perhaps restart discussions in the abandoned pull request with a view to getting this critical feature on the ExternalDNS roadmap. The extent of involvement of myself or my team in implementing or driving this feature work will be determined later.

Creative Commons License
Except where otherwise noted, this work is licensed under a Creative Commons Attribution 4.0 International License .