Inferring Accurate Geo-Aware PoP-Level Perspective of the Internet's Inter-AS Connectivity
Understanding where and how the 40K+ routed Autonomous Systems (ASes) in today's Internet interconnect is essential for meaningfully investigating a wide range of critical Internet-related problems such as the vulnerability of the Internet to physical damage. However, much of the published work on Internet topology has focused primarily on discovering the "existence" of such interconnections, e.g., logical connectivity such as AS-to-AS links or physical connectivity such as router-to-router links. Considerably less attention has been paid to the where and in how many different locations these interconnections have been established. For example, the often-studied AS-level view of the Internet is too coarse as mapping entire ASes to single geographic locations eliminates essential details (e.g., AS-level path diversity). At the same time, the popular router-level view of the Internet is not only too detailed, but also inherently difficult to capture.
The main goal of this research project is to design, develop and rigorously evaluate techniques to accurately map the geographic location of all the PoPs of a given target AS and determine the inter-AS connections that are established at each PoP of this AS. In short, we are interested in producing for any given AS its corresponding geographically aware map of inter-AS connectivity at the PoP-level. Following is the list of activitities related to the PoP level mapping of the Internet Topology.
A significant fraction of the Internet's physical infrastructure (e.g., routers, switches, and related equipment's) are hosted at a relatively small number of physical building complexes such as colocation facilities or carrier hotels and Internet eXchange Points (IXPs). More importantly, these facilities have generally known street addresses and thus can be accurately geo-located. Companies like Equinix, CoreSite, and Telx manage and operate these carrier-neutral colocation facilities (also called colos) where they provide, among other offerings, interconnection services. These facilities supply the infrastructure (e.g., rack space, cabling, power, and physical security) necessary for network operators to colocate their routers for easy interconnection.
This observation motivate our new methodology that is specifically designed to map a given colocation (or colo) facility. This methodology relies on targeted active measurements to identify not only all the PoPs of all the ASes present in that colo facility, but also the corresponding inter-AS connectivity that is visible to active probing at that location. In turn, this methodology defines a very promising, widely applicable, and highly accurate approach for geo-locating potentially hundreds or thousands of IP addresses (i.e., all the discovered IPs of the interfaces on the routers in the co-located PoPs) to the street address of that facility.
This work focuses on identifying interconnections of the "cross connect" type, i.e., dedicated point-to-point private peering links (which might be used to carry transit traffic or peer-to-peer traffic) that the network operators can buy from the colo providers so that their networks can exchange traffic within the confines of these facilities. In particular, our goal is to infer who is interconnecting with whom in which colos in which cities. Precisely locating the private peering links between two networks is a prerequisite for studying, for example, the root causes of the peering disputes between large content and eyeball providers in recent years. Read more about our xconnect mapping here and in our following technical report.
Large content providers (CPs) such as Google, are responsible for a large fraction of injected traffic to the Internet. They maintain multiple data centers and directly peer with many ASes to serve their requests. Understanding their geo-aware PoP-level connectivity of CPs shed light on why, how and where they connect to the rest of the Internet.
In this study, we develop a new method to reveal and characterize geo-aware PoP-level topology of large Internet Content Providers. We use application level probing to identify valid IP addresses associated with CPs, and then deploy large scale traceroute measurement to infer their PoPs, their locations, and associated inter-AS connections at each PoP.
The AS-level topology has been the focus of much research in the past decade, with studies that range from measurements and inference to modeling and analysis and the development of synthetic topology generators.
ASes are not generic nodes but are entire networks that operate for a purpose and have a rich internal structure. Depending on an AS’s size, its network interconnects a number of geographically dispersed points-of-presence (PoPs), where it connects to its customers or interconnects with other networks, either directly or via Internet eXchange Points (IXPs). The importance of AS geography (i.e., geographic coverage or reach, number and location of PoPs, presence at IXPS) is further highlighted by the fact that the peering contracts of many ASes list explicit and geography-specific requirements for potential peering partners. For example, AS X will only peer with AS Y if Y's geographic reach is sufficiently large, or X and Y have a certain number of overlapping PoP location.
This study examines a new approach to determine the geographical footprint of individual Autonomous Systems that directly provide service to end-users, i.e., eyeball ASes. The key idea is to leverage the geo-location of end-users associated with an eyeball AS to identify its geographical footprint. We leverage the kernel density estimation method to estimate the density of users across the area served by individual eyeball ASes. This method enables us to cope with the potential error associated with the location of individual end-users while controlling the level of aggregation among data points to capture a geo-footprint at the desired resolution. We use the resulting geo-footprint of individual eyeball ASes to identify their likely PoP locations.
The collection of inferred or obtained information about the location of PoPs, colocation facilities and other network infrastructures lead to a large database. In this effort, we build a web-based portal that allows a user to query this database and learn about the infrastructure nodes around a certain location. Such a portal not only helps us in designing our measurement campaigns but it also provides valuable information about the partial view of the PoP-level topology for other researchers.
PTR records resolves the IP address to a domain/hostname. PTR records, that are often times refer to as DNS names or hostnames, are used for the reverse DNS (Domain Name System) lookup. Network operators and administrators often embed hints in name to ease network debugging. For instance, when a traceroute reports
ip-64-32-149-181.lax1.megapath.net at a certain hop, a network engineer can infer that the trace reached Los Angles at that hop.
Automatically extracting router attributes from DNS names have been focus of a few previous studies. The most notable tool in this domain is UNDNS. UNDNS uses network specific regular expressions to reveal geo-information from a DNS name. Since it uses a network specific rule set, the maintenance and expansion of the rule set becomes a very tedious task. Not only adding new rules is a totally manual labor, which requires domain knowledge about the geography of and location code names, this task also needs to be separately done for each network. This high labor cost is the main limitation of UNDNS.
Expanding on the same idea, DRoP aims to minimize the labor in adding new rules, by looking for pattern in a large pool DNS names assigned by the same network. They use various delay based inspections to identify the correct patterns. Once a pattern is detected, it is then generalized to increase its coverage. Although this approach potentially minimize the cost of adding rules, the rule set is still network specific and therefor is hard to maintain. The DNS names and formats may change over time. Some changes in the DNS name will render the pattern written for the network useless. For instance DRoP on
ip-64-32-149-181.lax1.megapath.net does not reveal the association with Los Angeles, but can show the
ip-64-32-149-181.lax.megapath.net resides in Los Angeles. In addition, there is a wide variation in the DNS name formats used by system administrators. To write rules which don’t lead to erroneous results, the regular expressions have to be very specific in some cases. This defeats the purpose of writing one regular expression for a group of DNS names. This also leads to a need to write a large number of regular expressions.
ALFReD - Acquiring Location FRom DNS - aims to address of these shortcomings. ALFReD differs from other DNS parsing tools, as it uses various large dictionaries that reveal relevant information from DNS names. We expand on the idea by also extracting interface and router attributes. ALFReD does not use network specific rules, which makes the rule base maintenance easier. Instead it tries to extract all the possible information. When a DNS name packs conflicting information, ALFReD reports all extractable information, with a confidence level that shows what piece is most probably accurate. Our tool is available to public here.
This project is funded by the National Science Foundation (NSF) grant no. CNS 209490. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.