## Goal of the AS Rank project

Independent networks (Autonomous Systems, or ASes) engage in typically
voluntary bilateral interconnection ("peering") agreements
to provide reachability to each other for some subset of the Internet.
The implementation of these agreements introduces a non-trivial
set of constraints regarding paths over which Internet traffic can flow,
with implications for network operations, research, and evolution.
Realistic models of Internet topology, routing, workload, and performance
must account for the underlying economic dynamics. Although most
interconnection ("peering") agreements between networks are secret,
some information can be inferred from traceroute-derived and BGP-derived
Internet topology
data sets at IP, router, and AS granularities.
(For additional details on economic modeling, See our recently proposed
research on the economics of Internet transit and peering
interconnections.)
AS rank uses these data sets to estimate a macroscopic ranking of
Autonomous Systems based on a measure of their influence in the
global routing system, specifically the size of their *customer cone*
based on inferred business relationships among ASes. We explain these
concepts in more detail below.

An AS's rank is based on it's customer cone size, which in turn is is
inferred from BGP paths by CAIDA's AS relationships inference algorithm^{9}.

## AS Rank

num. ASNs | 10 | 9 | 9 | 8 | 3 | 2 |
---|---|---|---|---|---|---|

rank | 1 | 2 | 2 | 4 | 5 | 6 |

An AS's rank is equal to one greater than number of ASs with larger customer cone sizes or degrees. So ASes with equally large customer cone sizes and degrees will have the same rank. In such a case, the rank will increase by multiple steps "after" a sequence of equally-ranked ASes.

## AS Degree

**Figure 1.** Comparing the global degree, transit degree, and
outdegrees of ASes.

Degree is the number of neighbors that a node, an AS in our case, has in a graph. There are various types of degrees: global, out, and transit. These degrees are applied to an unlabeled graph and so edge types are ignored. Specifically, a global degree is a standard degree in which all of an AS's neighbors are counted. out and transit degrees, however, change based on the observed paths. For example, out degree only counts neighbors that follow the AS in the path and transit degree only counts neighbors that are found in at least one path where the AS is in between that neighbor and another neighbor of the AS.

In Figure 1, ASes 2, 3, 4, and 5 have an outdegree of 0, since no nodes follow them in any path. ASes 6 and 7, however, have an outdegree of 2, while AS 1 has a high degree of 3 because it hosts a monitor.

If we consider a graph labeled with relationships, then there are additional degree types such as siblings, customers, providers, and peers. An AS's sibling degree is the number of neighbor ASes it shares from the same organizations. Customer, provider, and peer degrees are the number of neighbors that are customers, providers, and peers respectively.

## AS Relationships

**Figure 2.** Types of AS relationships. The ASes at the bottom of
the graph, D, E, and F, are customers of those above. ISPs in
the middle, B and C, are both providers of ASes below and
customers of ISPs above. ISPs B and C are also peers of each
other. ISP A at the top is a provider to B and C and a customer
of no one.

### Background

Although business agreements between ISPs can be complicated, the original
model introduced by Gao ^{1} abstracts
business relationships into the following three most common types:

- customer-to-provider (
*c2p*) (or if looked at from the opposite direction, provider-to-customer*p2c*), - peer-to-peer (
*p2p*), and - sibling-to-sibling (
*s2s*)

The justification for this classification is that an AS must buy
transit services for any traffic destined to parts of the Internet
that this AS neither owns nor can reach through its customers. In
Figure 2, arrow directions reflect flows of money -- ASes at
lower levels are *customers* who
pay ISPs (*providers*) at higher levels in exchange for
access to the rest of the Internet, also known as *transit*.
We refer to links between a customer and a provider as **
c2p (p2c) links**.
In Figure 2, D->B, E->B, F->C, B->A, and C->A are
c2p links.

A ** p2p link** connects two ISPs who have
agreed to exchange traffic on a quid pro quo basis. Peers should exchange
traffic only between each other and each other's customers. Peering
allows growing ISPs to save money on transit costs they would
otherwise have to pay to deliver traffic to/from their customers.
In Figure 2, B-C is a p2p link, unidirectional since
neither B nor C pays the other for the traffic they exchange.

An ** s2s link** connects two ASes
with a common administrative boundary. Such links usually
appear as a result of mergers and acquisitions, or under certain
network management scenarios.

**Figure 3.** The top two paths 1 and 2 are valid, while the bottom two
3 and 4 are invalid.

We use the notion of money transfers between ASes to define
*valid* and *invalid*
AS paths. A **valid path** between source
and destination ASes is one in which for *every*
ISP providing transit (a transit provider), there exists a customer
immediately adjacent to the ISP in the AS path.
An **invalid path** has at least one transit provider
not paid by a neighbor in the path.

In Figure 3 the top two examples are valid paths, while the bottom two are invalid. In Example 1 the transit providers are A, B, and C. ISPs B and C pay to A, D pays to B, and F pays to C. In Example 2 the transit providers are B and C, and they are paid by D and F, respectively. In contrast, in Example 3 the transit provider is B, but not only does no one pay B, but B itself pays both A and Z. Example 4 also illustrates a situation where nobody pays transit provider B.

In other words, a valid path must have the following *
valid path pattern*: zero or more c2p links,
followed by zero or one p2p link, followed by zero or more p2c
links. In addition, s2s links can appear in any number anywhere in
the path.

### History of inference algorithms

Gao's^{1} pioneering work
inspired many researchers to seek
approaches to inferring ISP business relationships using information
from publicly available BGP routing tables. Gao used the concept
of valid paths as the basis for her inference heuristic and identified
the "top" provider in a given path based on AS degree (the number of
ASes with observed connectivity to a given AS).

Subramanian *et al.*^{2}
provided a more elegant mathematical formulation based on the
concept of valid paths, but they simplified the problem by not
inferring s2s links. Using maximization of valid paths as an objective,
they formulated the AS relationship inference problem as a
combinatorial optimization problem: given an undirected
graph *G* derived from a set of BGP paths
*P*, assign the edge type (c2p or p2p)
to every edge in *G* such that the total
number of valid paths in *P* is maximized.
They called the problem the **type-of-relationship (ToR) problem**,
conjectured that it is NP-complete, and provided a heuristic solution.

Di Battista *et al.*^{3} and independently
Erlebach *et al.*^{4} proved that the
ToR problem is indeed NP-complete. The latter proved also that it is even
harder, APX-complete. More importantly for practical purposes, both
studies demonstrated that p2p links *cannot*
be inferred in the ToR problem formulation, and they developed mathematically
rigorous approximate solutions to the ToR problem but inferred
only c2p and p2c links. No technique thus far reliably identifies
p2p links.

Dimitropoulos, *et al.*^{6}
identified still other issues with the ToR formulation, like the random
breaking of ties which can yield obviously incorrect inferences,
e.g., well-known large providers are inferred as customers
of small ASes. In the first paper^{6}
we handled this issue with multiobjective optimization techniques
that incorporated AS degree into the inference. In a subsequent
paper^{7} we introduced improved
algorithms that determine not only c2p but also p2p links (for
those we can detect from BGP data). These improvements achieved
more accurate AS relationship inferences,
which we demonstrate against ground truth for a set of ASes.
Benjamin Hummel and Sven Kosub
^{8} introduced the idea that
the resulting graph should be acyclic, i.e. should contain no cycles,
and presented a new algorithm that does the assignment and reduces
the number of cycles. Our current technique creates no cycles,
but at the cost of ``valid'' (valley-free) paths in the graph.
M. Luckie et al.^{9} dropped the
valley-free assumption and instead
relied on three assumptions about the Internet's inter-domain structure:
(1) an AS enters into a provider relationship to become globally reachable;
and (2) there exists a peering clique of ASes at the top of the hierarchy,
and (3) there is no cycle of p2c links.

## Customer Cone

One metric of the resulting AS relationship graph that allows comparison
across ASes is the **customer cone** -- the set of ASes, IPv4
prefixes, or IPv4 addresses that can be
reached from a given AS following only customer links.

Looking specifically at the **AS customer cone**,
we define an AS *A*'s **AS customer cone**
as the AS *A* itself plus all the ASes that can be reached
from *A* following only p2c links **in BGP paths we observed**.
In other words, *A*'s customer cone contains *A*, plus *A*'s
customers, plus its customers' customers, and so on.

Each AS announces a set of IPv4 prefixes. Each IPv4 prefix represents a
set of contiguous IPv4 addresses which are routed as a unit. Prefixes can
be nested, with the most specific prefix used for routing over less
specific prefixes. To find the set of prefixes which are reachable in
AS *A*'s **IPv4 prefix customer cone** create the union of prefixe
announced by all ASes found in AS *A*'s **AS customer cone**.
AS *A*'s **IPv4 address customer cone** is the set of addresses
covered by AS *A*'s **IPv4 prefix customer cone**.
Prefixes overlap, which represent a set of IPv4 addresses.

The size of the customer cone of an AS reflects the number of other elements (ASes, IPv4 prefixes, or IPv4 addresses) found in it's set. An AS in the customer cone is assumed to pay, directly or indirectly, for transit, and provides a coarse metric of the size or influence of an AS in the routing system.

Figure 4 depicts several AS customer cones, ASes **D, F,
G**, and **I** all sit at the bottom of the hierarchy
and so only have a single AS in their cone. **H** ranks
a little bit higher with 2 ASes. Both **C** and **B**
tie with 3 ASes. Note that **B** and **C** both have
**F** in their respective cones. **A** is ranked at the top
of the hierarchy with 6 ASes in its customer cone.

ASes with large customer cones play an important role in the Internet's capital and governance structure. At the top of this hierarchy are ISPs commonly known as Tier-1 ISPs, which do not pay for transit to upstream providers at all; instead they peer with each other to provide connectivity to all destinations in the Internet. At the bottom of the hierarchy are customer ASes who do not have their own customers and pay providers to reach all destinations in the Internet.

We define **peering cone size ratio** as the ratio in
customer cone sizes of a pair of ASes if they (hypothetically) peered.
Similar customer cone sizes will have this ratio closer to 100,
also an indication the ASes have more incentive to peer.
The closer this ratio is to zero, the larger the difference in
customer cone sizes, and the less incentive the larger provider
will have to peer with the smaller.

If a given link (AS relationship) is currently a customer link
and changes to a peering link, then the provider-turned-peer's customer
cone will likely shrink, because the cone will lose any customer ASes
that the given AS used to access through that customer-turned-peer.
The customer-turned-peer's cone size will not change since the
provider-turned-peer is not included in the customer cone.
To compare magnitude of differences, the ** peering cone size ratio**
always uses the larger customer cone as the denominator.
For example, for AS pair S and N, with customer cone sizes
C(S) and C(N), respectively , if C'(S) and C'(N) were
their respective customer cone sizes if S and N became
p2p peers (with other links unchanged),
then the **peering cone size ratio ** is C'(N)/C'(S)
if C'(S) > C'(N), otherwise C'(S)/C'(N).

**Figure 5.**shows the effect on customer cones of changing the relationship of

**A**to

**B**from its current one to a peering relationship.

Figure 5 illustrates how different relationships affect the customer cone sizes
of AS **A** and **B**. If the original graph had **B** as a
customer of **A** then **A**'s
cone contains 7 ASes: **A,B,C,D,E,F,G**. **B**'s cone
contains three ASes:**B,F,G**. If the link between **A** and
**B** is changed to a peering link, **A** loses customers
**B** and **G**, which it had access to exclusively through
its customer relationship with **B**. **A**'s cone does not
lose **F**, since it can still reach it through its customer
relationship with **C**. **A**'s cone size thus shrinks to 4
ASes:**A,C,D,F**. Since AS **B** did not previously reach
any customers through **A**, its customer cone is unaffected by
this change.

## AS Types

## Caveats and Limitations

Although we know of no more rigorous empirical analysis of macroscopic Internet topology enriched with AS relationships, we recognize that resource limitations constrain the quality of the science we can do.

- AS relationships are more complex than allowed for in our approach. The semantics of routing relationships between the same two ASes can differ by peering location or even by prefix; our model oversimplifies these cases by assigning a single relationship to each pair of ASes.
- A truly accurate picture of the Internet topology would require collection of data from every AS, while our automated ranking procedure is limited by the measurement data available to us.
- We know of know reliable heuristic for accurately inferring peering relationships. We are still seeking validation of these heuristics and their specific parameters against ground truth from providers.
- Our city-level visualization is limited by the accuracy of IP address geolocation and our sampling only a subset of a given AS's topology. Although the ITDK topology contains exclusively router IP addresses (not end hosts), MaxMind's GeoLite City is optimized for geolocating end hosts and is likely less accurate for router addresses. Although our IPv4 Routed /24 Topology measurements exhaustively probe every routed /24, we do not capture the full IPv4 topology, but a broad sample of paths from monitors to target destinations, which results in a sample of the underlying router topology.

In the future we would like to improve the integrity and utility of the relationship-based AS ranking. with more vantage points, more probing, cross-validation with conjunction with other sources of data, and more powerful data processing techniques to support larger topology samples.

## Data Sources

- Netacuity (Digital Elements): is a commercial geolocation service that we use to map ASes and their prefixes to geographic locations.
- Organization: this data comes from CAIDA's Inferred AS to Organization Mapping Dataset that maps mapping Autonomous Systems (AS) to organizations using quarterly bulk dumps of WHOIS databases. We use this to provide names for ASes and their organizations, as well as group ASes together that belong to the same organization.
- AS Relationships contains CAIDA's inferences about AS Relationships, Customer Cones, and Validation using monthly BGP traces. We use it to infer each ASes' links and their business types, degree, and customer cone.

## References

^{1} |
L. Gao, On Inferring Autonomous System Relationships in the Internet, IEEE/ACM Transactions on Networking, December 2001. |

^{2} |
L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, Characterizing the Internet Hierarchy from Multiple Vantage Points, IEEE INFOCOM, 2002. |

^{3} |
G. Di Battista, M. Patrignani, and M. Pizzonia, Computing the Types of the Relationships between Autonomous Systems, IEEE INFOCOM, 2003. |

^{4} |
T. Erlebach, A. Hall, and T. Schank, Classifying Customer-Provider Relationships in the Internet, Proceedings of the IASTED International Conference on Communications and Computer Networks (CCN), 2002. |

^{5} |
J. Xia and L. Gao, On the Evaluation of AS Relationship Inferences, IEEE Globecom, 2004. |

^{6} |
X. Dimitropoulos, D. Krioukov, B. Huffaker, kc claffy, and G. Riley, Inferring AS Relationships: Dead End or Lively Beginning, International Workshop on Efficient and Experimental Algorithms (WEA), 2005. |

^{7} |
X. Dimitropoulos, D. Krioukov, M. Fomenkov, B. Huffaker, Y. Hyun, kc claffy, and G. Riley, AS Relationships: Inference and Validation, ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp.29-40, 2007. |

^{8} |
B. Hummel and S. Kosub Acyclic Type-of-Relationship Problems on the Internet: An Experimental Analysis , In Proceedings of the 7th ACM SIGCOMM Internet Measurement Conference (IMC'2007), pages 221-226. ACM Press, New York, NY, 2007. |

^{9} |
M. Luckie, B. Huffaker, k. claffy, A. Dhamdhere, and V. Giotsas, AS Relationships, Customer Cones, and Validation , in Internet Measurement Conference (IMC), Oct 2013, pp. 243--256. |