Who controls the internet? - A look at diversity of authoritative NS records in gTLDs
#1
For this post, I am excluding the author's numerous pie charts and other graphics, except for one, which are not essential for the reader to gain an understanding of his writing.



-----------------------------------------------------



Written by Jan Schaumann

Published: November 15, 2022


But while the DNS root servers are known to be distributed, I thought it might be interesting to take a closer look at the immediate levels up from the root, and so I went to analyze the diversity or centralization of the authoritative nameservers for the generic top-level domains (gTLDs) and the second-level domains in those gTLDs.

To perform this analysis, I started out with the root zone, which (as of November 2022) contains 1485 TLDs. As I discussed previously, just what exactly you find in there is already utterly fascinating, but for our purposes here, let's note that you can then request access to all of the gTLD zone files via ICANN's Centralized Zone Data Service, which got me access to 1,165 zones in total. In addition, you can obtain the .gov zone from CISA's GitHub repository, as well as .arpa from most of the root servers.

This leaves us missing the .edu, .int, .mil and .post TLDs, which are not generally available. (If you know how to get access, please let me know.)

For the country-code specific top-level domains (ccTLDs), it's a lot more difficult to gain access: most operators do not provide public access, although some do: you can AXFR some of them or gather some published data from others. Commercial services exist that sell you zone data, but it seems to me that this data ought to be public, so I excluded ccTLDs from my analysis for the time being.

Anyway, so with 1,168 total zone files adding up to around 7GB of data (of which the .com zone accounts for 4.8 GB alone!), I went ahead and used a variety of shell scripts and some perl glue to parse out the NS records to then see just what domains those are in, i.e., who controls them.



The Root


The DNS root zone itself is served by 13 root authorities, and as such is obviously and trivially diverse. The 13 authorities are managed by twelve root operators: 9 US organizations (including three US government entities), of which one (Verisign) operates two roots, one Swedish company (Netnod), one organization in Japan (WIDE), and one headquartered in the Netherlands (RIPE NCC). Obviously, all are in the same domain (i.e., root-servers.net).

Now for the root itself, this illustration is of course a bit silly, but it gives you an idea of what I'm looking for in this analysis. And things do get a bit more interesting once we process all the NS records from the root zone itself, where we find 7,507 total NS records across 5,612 unique name servers, which looks reasonably diverse.

But if you look closer, you'll notice that many of the nameservers are in the same domain, so if we then flatten the whole thing, we see a bit more of a centralization. For example, 6.3% of the NS records being under nstld.com, which is operated by Verisign.

But thinking about this distribution a bit more quickly makes you realize that there isn't really an even distribution in the gTLD, since not all domains have the same footprint. As you may guess, the .com zone has more records than some of the other zones. More specifically, .com has over 164 million NS records, making up 73% of all the NS records in all the gTLDs.

The NS records for .com are in the gtld-servers.net domain, but so are e.g., .net's; similarly, the NS records for .org and .info are in the same domain, so we can flatten this data a little bit more.

In other words, almost 80% of all NS records across all gTLDs are under the gtld-servers.net domain, and thus the control of Verisign -- the same Verisign that also operates two roots.

Ok, so this is the representation of the NS records for the gTLDs within the root zone, but what about the NS records for all the second-level domains within the gTLDs? Parsing all 1,168 zone files, we end up with 2,699,827 unique name servers that we can group under 1,063,092 domains.

This shows a notable centralization of the NS records found in all gTLD zones, with domaincontrol.com accounting for roughly 20% alone.

Another thing that seems interesting here is that some of the cloud companies offering DNS services are choosing to use a larger number of NS records even across, in the case of AWS, thousands of second-level domains in several TLDs.

The data now show that out of the over 534 million NS records across a little over 1 million domains:

* 43% of all NS records (roughly 230 million) are served by only 165 name servers found in just 10 domains

* 52% (~ 278 million) are served by 255 name servers in just 20 domains

* 75% (~ 401 million) are served by 1,580 name servers in just 100 domains

* 99% (~ 529 million) are served by 345,000 name servers in 6,000 domains


Let's look at these 20 domains and see who controls them, and thus over half of all the domains in all the gTLDs:



[Image: 8IQf15gW_o.jpg]



You may notice that of these 20 organizations, 15 are US entities, 2 Chinese, 1 German, 1 Israeli, and one from Singapore, giving you an idea what governments could -- in theory, at least -- exert control over what percentage of the internet.

Another interesting thing to point out here is that even though the domains are registered by different organizations, the name servers in use may actually be operated from a different entity's networks. In particular, it looks like several of the name servers in these domains are running out of, fronted by, or otherwise utilizing Cloudflare's network, while Wix seems to be using Google Cloud (I'm guessing) to run their name servers.

name.com is owned by Identity Digital, the rebranding of the merged Donuts and Afilias (previously discussed here) registries, which also operates a significant number of TLD domains.

All in all a sign that perhaps we should take a look at the Autonomous System (AS) numbers the various name servers are in, and so, a few thousand lookups later.

That's right: around 34% of the majority of NS records are resolving to IP addresses in Cloudflare's AS13335, and over half of all are ultimately served from only four Autonomous Systems: Cloudflare (AS13335), Alibaba (AS37963), GoDaddy (AS44273), and IONOS (AS8560) (hinting at the other big load-bearing infrastructure pillar that also remains largely insecure by default.

And while that is interesting by itself, just as before when we looked at the name servers serving the gTLD domains themselves and we tried to weigh them against how many domains they support, perhaps we should also look at not only the NS diversity in the raw gTLDs; after all, control of google.com or facebook.com surely counts more than, say, monkeyjungle.com.

So what do people do when they want to look at popular domains? They go for the "Alexa Top 1 Million Domains" list, of course! Only... Alexa was bought by Amazon, and in a sign of "who controls the internet", Amazon promptly shut it down. (As of November 8th, 2022, the actual list was still available, but it looks like it has since been restricted.) Of course there are other, similar lists (like e.g., the Cisco Umbrella or the Majestic Million), all of which intersect to some degree but remain distinct based on the heuristics used by the data collection mechanisms used. For this reason, researchers provide a normalized Top 1 Million list (see their paper for more details), which I've used for this project here.

Iterating over that full list and looking up the NS records for 1 million domains then yields a breakdown of 2,636,294 total NS records in 119,291 domains, as well as the insight that spreadsheets are surprisingly bad at handling large data sets even of simple text data.

So we see a very similar distribution to our analysis of all of the NS records in all of the gTLDs here in the top 1 million domains, too: More than half of the NS records used by the top one million domains are found in just 20 of the 120K domains, served by only 1,740 NS records.

The top ten NS record domains is represented by the usual suspects (Cloudflare, Amazon, GoDaddy, Akamai, DigiCert, Google, Microsoft, Alibaba, Network Solutions, and Namecheap), although not identical to those we observed for all of the gTLD records.

Also noteworthy is that the distribution across NS domains shifts somewhat when you look at the top 100 domains (Azure, AWS, Google, Akamai), the top 1,000 domains (AWS, Akamai, NS1, Google), the top 10K domains (AWS, AKamai, Cloudflare, NS1) and the full top 1 million (Cloudflare, Amazon, GoDaddy, Akamai), suggesting that more of the less popular sites use Cloudflare than do the higher ranked sites.

At the same time, when we do the same breakdown by AS number as before (with many thanks to our friends at Team Cymru), we notice an even increased centralization.

Out of almost 10,000 IP addresses covering 75% of the top one million domains' NS records, over 40% again land in Cloudflare's AS13335, with most of the others being mere "also-ran"s.

Ok, so that's a whole lot of pie charts, and learning that there is indeed a fair bit of centralization at the gTLD level of the DNS will not come as a surprise to many. However, crunching those numbers still provides for some useful insights. So if we wanted to answer the question "Who controls the internet?", then I think that we may find multiple answers:

1. Verisign -- In addition to operating two of the DNS root authorities, Verisign also controls the gtld-servers.net domain, which we've seen above is home to a whopping 80% of all gTLD NS records! Take out Verisign, and the internet's going to have a bad day.

2. A handful of large companies -- i.e., the usual suspects. With 43% of all NS records in all gTLDs and 44% of those in the Top 1M in a combined 14 domains, any one of those could exert significant control over large chunks of the internet. But amongst those companies, a few stand out:

3. GoDaddy -- owner of the aptly named domaincontrol.com domain is responsible for 20% of all NS records in all gTLDs alone.

4. Cloudflare -- responsible for 20% of NS records in the top one million domains, Cloudflare also provides the IP space home to a total 40% of those NS records.

What this centralization means in practice and whether, for example, the US government could realistically exert control over the root operators and companies discussed here, is a different story altogether. But no matter how you look at it, the internet seems increasingly less distributed or decentralized as more and more businesses and organizations appear to concentrate in a handful of registries and cloud service providers.

We don't have a single point of failure just yet, but I do see multiple points of calamity with increasing blast radius...



https://www.netmeister.org/blog/nsauth-diversity.html
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Music labels win $46.7 mln from internet provider in piracy trial Resurgence 0 5,061 Nov 07, 2022, 12:15 pm
Last Post: Resurgence
  British government is scanning all Internet devices hosted in UK Resurgence 0 5,330 Nov 05, 2022, 09:27 am
Last Post: Resurgence
  Nanoparticles that control flow of light could mean faster and cheaper internet Resurgence 0 5,677 Jun 21, 2022, 01:26 am
Last Post: Resurgence
  ICANN denies Ukrainian request to shut down Russian internet domains Resurgence 0 4,489 Mar 04, 2022, 19:30 pm
Last Post: Resurgence
  US: Giant leap toward quantum internet realized with Bell state analyzer Resurgence 0 4,069 Mar 04, 2022, 19:18 pm
Last Post: Resurgence



Users browsing this thread: 1 Guest(s)