[그래프데이터분석및응용] Network centrality

해당 포스팅은 [연세대학교 21-1 UT 세미나 그래프데이터분석과응용]을 기반으로 작성되었습니다.

1. Network Centrality in indirected networks

Degree Centrality
- The number of nearest neighbors
- Normalized degree centrality: 노드가 많아지면, nearest neighbor도 많아지기 때문에, 정규화 해주는 것 (0-1 scale로 표시)
- High Degree를 가진 다는 것은 Direct contact with many other actors.
Closeness centrality
- How close an actor to all the other actors in network.
- Normalized closeness centrality
- High closeness를 가진다는 것은 다른 node들까지의 거리가 가장 짧은 경우로, all other nodes로 가기 위한 step이 가장 짧다는 것을 의미한다.
- Disconnected network
  - i에서 j까지 이어지는 경로가 없을 때는 무한대로 계산을 하며, centrality 계산 시에 제외를 해준다.
  - Harmonic centrality를 계산을 할 수도 있음
Betweenness Centrality
- The number of shortest paths going through the node i
  - s와 t를 이어주는 최단 경로들 중에서 i를 지나는 최단 경로의 수
- Normalized betweenness centrality
- High betweenness를 가진다는 것은 vertex lies on many shortest paths.
Eigenvector Centrality
- 어떤 이웃인지를 고려하는 것
  - select an eigenvector associated with the largest eigenvalue
  - Perron-Frobenius theorem에 의함
    - A real square matrix with positive entries has a unique largest absolute real eigenvalue, leading eigenvalue
    - The corresponding eigenvector can be chosen to have strictly positive components, leading eigenvector
    - The other eigenvectors have at least one negative or non-real component.
  - 이를 활용하여 node 수만큼의 eigenvector 중 leading eigenvector를 선택하는 것
- High eigenvector를 가지면 vertex가 중요한 이웃을 가짐을 알 수 있다.

2. Network centralities in directed networks

In and out-degree centrality
Closeness centrality
Betweenness centrality
PageRank
- If the adjacency matrix is asymmetric, the eigenvector centrality is useless.
- A ‘vote’ from an important page is worth more
  - Each link’s vote is proportional to the importance of its source page
  - If page i with importance r(i) has d(i) out links, each links gets r(i)/d(i) votes
  - Page j’s own importance r(j) is the sum of the votes on its in-links
- A page is important if it is pointed to by other important pages
  - Define a ‘rank’ r(j) for node j
    - node가 많아지면 Gaussian elimination으로 해를 구할 수가 없을 것이다.
- Matrix formulation
  - Stochastic adjacency matrix M
    - Let page j have d(j) out-links
    - If j-> i, then M(ij)= 1/d(j)
  - M is a column stochastic matrix where each column sums to 1.
  - Rank vector r: an entry per page
    - r(i) is the importance score of page i
    - sigma(r(i))=1
  - The flow equations
    - r=M*r
    - rank vector r을 찾아야 함
- Eigenvector Formulation
  - Rank vector r is an eigenvector of the stochastic matrix M.
  - Power iteration: efficient solution to obtain r
    - produces the leading eigenvalue and its corresponding eigenvector
    - A simple iterative scheme (약 50번 정도의 iteration을 거치면 limiting solution을 얻을 수 있음)
- Rankdom walk interpretation
  - Imagine a random web surfer
    - At any time t, surfer is on some page i
    - At time t+1, the surfer follows an out-link from i uniformly at random.
    - Ends up on some page j linked from i
    - Process repeats indefinitely
  - Let P(t): vector whose i-th coordinate is the probability that the surfer is at page i at time t.
    - P(t)는 probability distribution over pages
  - Where is the surfer at time t+1?
    - Follow a link uniformly at random: p(t+1) = Mp(t)
  - Suppose the random walk reaches a state p(t+1)=Mp(t)=p(t)
  - Our original rank vector r satisfies r=Mr
    - r is a stationary distribution for the random walk
- Does this converge?
  - Problem 1: Spider trap
    - PageRank scores are not what we want
    - Solution: teleport를 사용하여 beta의 확률로 link를 따라가거나 (1-Beta)의 확률로 random page로 이동
  - Problem 2: Dead end
    - The matrix is not column stochastic (out link가 없으므로, 어느 열에서도 transition될 확률이 0이 됨)
    - solution: teleport를 사용하여 dead end의 경우 1의 확률로 teleport를 시키는 것 (adjust matrix accordingly)

3. Centralities in a weighted network

Degree & Eigenvector centrality
- Replace elements of the adjacency matrix with weights
Closeness & betweenness centrality
- Issue: definition of a distance between two adjacent nodes i and j depends on the definition of weight
- If the weight indicates strength or frequency of the interaction, then the distance is where alpha is hyperparameter
  - Epidemic network
- If the weight indicates cost of the interaction, the distance is
  - Road traffic network

1. Network Centrality in indirected networks

2. Network centralities in directed networks

3. Centralities in a weighted network

Share this post