Independent networks (Autonomous Systems, or ASes) engage in typically voluntary bilateral interconnection ("peering") agreements to provide reachability to each other for some subset of the Internet. The implementation of these agreements introduces a non-trivial set of constraints regarding paths over which Internet traffic can flow, with implications for network operations, research, and evolution. Realistic models of Internet topology, routing, workload, and performance must account for the underlying economic dynamics. Although most interconnection ("peering") agreements between networks are secret, some information can be inferred from traceroute-derived and BGP-derived Internet topology data sets at IP, router, and AS granularities. (For additional details on economic modeling, See our recently proposed research on the economics of Internet transit and peering interconnections.) AS rank uses these data sets to estimate a macroscopic ranking of Autonomous Systems based on a measure of their influence in the global routing system, specifically the size of their customer cone based on inferred business relationships among ASes. We explain these concepts in more detail below.
Figure 1. Types of AS relationships. The ASes at the bottom of the graph, D, E, and F, are customers of those above. ISPs in the middle, B and C, are both providers of ASes below and customers of ISPs above. ISPs B and C are also peers of each other. ISP A at the top is a provider to B and C and a customer of no one.
Although business agreements between ISPs can be complicated, the original model introduced by Gao 1 abstracts business relationships into the following three most common types:
- customer-to-provider (c2p) (or if looked at from the opposite direction, provider-to-customer p2c),
- peer-to-peer (p2p), and
- sibling-to-sibling (s2s)
The justification for this classification is that an AS must buy transit services for any traffic destined to parts of the Internet that this AS neither owns nor can reach through its customers. In Figure 1, arrow directions reflect flows of money -- ASes at lower levels are customers who pay ISPs (providers) at higher levels in exchange for access to the rest of the Internet, also known as transit. We refer to links between a customer and a provider as c2p (p2c) links. In Figure 1, D->B, E->B, F->C, B->A, and C->A are c2p links.
A p2p link connects two ISPs who have agreed to exchange traffic on a quid pro quo basis. Peers should exchange traffic only between each other and each other's customers. Peering allows growing ISPs to save money on transit costs they would otherwise have to pay to deliver traffic to/from their customers. In Figure 1, B-C is a p2p link, unidirectional since neither B nor C pays the other for the traffic they exchange.
An s2s link connects two ASes with a common administrative boundary. Such links usually appear as a result of mergers and acquisitions, or under certain network management scenarios.
Figure 2. The top two paths 1 and 2 are valid, while the bottom two 3 and 4 are invalid.
We use the notion of money transfers between ASes to define valid and invalid AS paths. A valid path between source and destination ASes is one in which for every ISP providing transit (a transit provider), there exists a customer immediately adjacent to the ISP in the AS path. An invalid path has at least one transit provider not paid by a neighbor in the path.
In Figure 2 the top two examples are valid paths, while the bottom two are invalid. In Example 1 the transit providers are A, B, and C. ISPs B and C pay to A, D pays to B, and F pays to C. In Example 2 the transit providers are B and C, and they are paid by D and F, respectively. In contrast, in Example 3 the transit provider is B, but not only does no one pay B, but B itself pays both A and Z. Example 4 also illustrates a situation where nobody pays transit provider B.
In other words, a valid path must have the following valid path pattern: zero or more c2p links, followed by zero or one p2p link, followed by zero or more p2c links. In addition, s2s links can appear in any number anywhere in the path.
History of inference algorithms
Gao's1 pioneering work inspired many researchers to seek approaches to inferring ISP business relationships using information from publicly available BGP routing tables. Gao used the concept of valid paths as the basis for her inference heuristic and identified the "top" provider in a given path based on AS degree (the number of ASes with observed connectivity to a given AS).
Subramanian et al.2 provided a more elegant mathematical formulation based on the concept of valid paths, but they simplified the problem by not inferring s2s links. Using maximization of valid paths as an objective, they formulated the AS relationship inference problem as a combinatorial optimization problem: given an undirected graph G derived from a set of BGP paths P, assign the edge type (c2p or p2p) to every edge in G such that the total number of valid paths in P is maximized. They called the problem the type-of-relationship (ToR) problem, conjectured that it is NP-complete, and provided a heuristic solution.
Di Battista et al.3 and independently Erlebach et al.4 proved that the ToR problem is indeed NP-complete. The latter proved also that it is even harder, APX-complete. More importantly for practical purposes, both studies demonstrated that p2p links cannot be inferred in the ToR problem formulation, and they developed mathematically rigorous approximate solutions to the ToR problem but inferred only c2p and p2c links. No technique thus far reliably identifies p2p links.
Dimitropoulos, et al.6 identified still other issues with the ToR formulation, like the random breaking of ties which can yield obviously incorrect inferences, e.g., well-known large providers are inferred as customers of small ASes. In the first paper6 we handled this issue with multiobjective optimization techniques that incorporated AS degree into the inference. In a subsequent paper7 we introduced improved algorithms that determine not only c2p but also p2p links (for those we can detect from BGP data). These improvements achieved more accurate AS relationship inferences, which we demonstrate against ground truth for a set of ASes. Benjamin Hummel and Sven Kosub 8 introduced the idea that the resulting graph should be acyclic, i.e. should contain no cycles, and presented a new algorithm that does the assignment and reduces the number of cycles. Our current technique creates no cycles, but at the cost of ``valid'' (valley-free) paths in the graph.
One metric of the resulting AS relationship graph that allows comparison across ASes is the customer cone -- the set of ASes, IPv4 prefixes, or IPv4 addresses that can be reached from a given AS following only customer links.
Looking specifically at the AS customer cone, we define an AS A's AS customer cone as the AS A itself plus all the ASes that can be reached from A following only p2c links in BGP paths we observed. In other words, A's customer cone contains A, plus A's customers, plus its customers' customers, and so on.
Each AS announces a set of IPv4 prefixes. Each IPv4 prefix represents a set of contiguous IPv4 addresses which are routed as a unit. Prefixes can be nested, with the most specific prefix used for routing over less specific prefixes. To find the set of prefixes which are reachable in AS A's IPv4 prefix customer cone create the union of prefixe announced by all ASes found in AS A's AS customer cone. AS A's IPv4 address customer cone is the set of addresses covered by AS A's IPv4 prefix customer cone. Prefixes overlap, which represent a set of IPv4 addresses.
The size of the customer cone of an AS reflects the number of other elements (ASes, IPv4 prefixes, or IPv4 addresses) found in it's set. An AS in the customer cone is assumed to pay, directly or indirectly, for transit, and provides a coarse metric of the size or influence of an AS in the routing system.
Figure 3 depicts several AS customer cones, ASes D, F, G, and I all sit at the bottom of the hierarchy and so only have a single AS in their cone. H ranks a little bit higher with 2 ASes. Both C and B tie with 3 ASes. Note that B and C both have F in their respective cones. A is ranked at the top of the hierarchy with 6 ASes in its customer cone.
ASes with large customer cones play an important role in the Internet's capital and governance structure. At the top of this hierarchy are ISPs commonly known as Tier-1 ISPs, which do not pay for transit to upstream providers at all; instead they peer with each other to provide connectivity to all destinations in the Internet. At the bottom of the hierarchy are customer ASes who do not have their own customers and pay providers to reach all destinations in the Internet.
We define peering cone size ratio as the ratio in customer cone sizes of a pair of ASes if they (hypothetically) peered. Similar customer cone sizes will have this ratio closer to 100, also an indication the ASes have more incentive to peer. The closer this ratio is to zero, the larger the difference in customer cone sizes, and the less incentive the larger provider will have to peer with the smaller.
If a given link (AS relationship) is currently a customer link and changes to a peering link, then the provider-turned-peer's customer cone will likely shrink, because the cone will lose any customer ASes that the given AS used to access through that customer-turned-peer. The customer-turned-peer's cone size will not change since the provider-turned-peer is not included in the customer cone. To compare magnitude of differences, the peering cone size ratio always uses the larger customer cone as the denominator. For example, for AS pair S and N, with customer cone sizes C(S) and C(N), respectively , if C'(S) and C'(N) were their respective customer cone sizes if S and N became p2p peers (with other links unchanged), then the peering cone size ratio is C'(N)/C'(S) if C'(S) > C'(N), otherwise C'(S)/C'(N).
Figure 4 illustrates how different relationships affect the customer cone sizes of AS A and B. If the original graph had B as a customer of A then A's cone contains 7 ASes: A,B,C,D,E,F,G. B's cone contains three ASes:B,F,G. If the link between A and B is changed to a peering link, A loses customers B and G, which it had access to exclusively through its customer relationship with B. A's cone does not lose F, since it can still reach it through its customer relationship with C. A's cone size thus shrinks to 4 ASes:A,C,D,F. Since AS B did not previously reach any customers through A, its customer cone is unaffected by this change.
Although we know of no more rigorous empirical analysis of macroscopic Internet topology enriched with AS relationships, we recognize that resource limitations constrain the quality of the science we can do.
- AS relationships are more complex than allowed for in our approach. The semantics of routing relationships between the same two ASes can differ by peering location or even by prefix; our model oversimplifies these cases by assigning a single relationship to each pair of ASes.
- A truly accurate picture of the Internet topology would require collection of data from every AS, while our automated ranking procedure is limited by the measurement data available to us.
- We know of know reliable heuristic for accurately inferring peering relationships. We are still seeking validation of these heuristics and their specific parameters against ground truth from providers.
- Our city-level visualization is limited by the accuracy of IP address geolocation and our sampling only a subset of a given AS's topology. Although the ITDK topology contains exclusively router IP addresses (not end hosts), MaxMind's GeoLite City is optimized for geolocating end hosts and is likely less accurate for router addresses. Although our IPv4 Routed /24 Topology measurements exhaustively probe every routed /24, we do not capture the full IPv4 topology, but a broad sample of paths from monitors to target destinations, which results in a sample of the underlying router topology.
In the future we would like to improve the integrity and utility of the relationship-based AS ranking. with more vantage points, more probing, cross-validation with conjunction with other sources of data, and more powerful data processing techniques to support larger topology samples.
|1||L. Gao, On Inferring Autonomous System Relationships in the Internet, IEEE/ACM Transactions on Networking, December 2001.|
|2||L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, Characterizing the Internet Hierarchy from Multiple Vantage Points, IEEE INFOCOM, 2002.|
|3||G. Di Battista, M. Patrignani, and M. Pizzonia, Computing the Types of the Relationships between Autonomous Systems, IEEE INFOCOM, 2003.|
|4||T. Erlebach, A. Hall, and T. Schank, Classifying Customer-Provider Relationships in the Internet, Proceedings of the IASTED International Conference on Communications and Computer Networks (CCN), 2002.|
|5||J. Xia and L. Gao, On the Evaluation of AS Relationship Inferences, IEEE Globecom, 2004.|
|6||X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, and G. Riley, Inferring AS Relationships: Dead End or Lively Beginning, International Workshop on Efficient and Experimental Algorithms (WEA), 2005.|
|7||X. Dimitropoulos, D. Krioukov, M. Fomenkov, B. Huffaker, Y. Hyun, kc claffy, and G. Riley, AS Relationships: Inference and Validation, ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp.29-40, 2007.|
|8||B. Hummel and S. Kosub Acyclic Type-of-Relationship Problems on the Internet: An Experimental Analysis , In Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC'2007), pages 221-226. ACM Press, New York, NY, 2007.|