[그래프데이터분석및응용] Random walk based network embedding
해당 포스팅은 [연세대학교 21-1 UT 세미나 그래프데이터분석과응용]을 기반으로 작성되었습니다.
1. End-to-end Node Embedding
1.Traditional machine learning for graphs
- Given input graph
 - Extract node, link, and graph-level features
 - learn a model that maps features to labels
 
2. Graph representation learning
- alleviates the need to do feature engineering every single time by automatically learning the features
 - Goal
- Efficient task-independent feature learning for machine learning with graphs
 - 각 node를 잘 설명할 수 있는 vector을 찾는 것
 
 
3. Why embedding?
- Task: Map nodes into an embedding space
 - Similarity of embeddings between nodes=similarity in the network
- ex: Both nodes are close to each other (connected by edge)
 
 - Encode the network information
 - Potentially used for *many downstream tasks
 
4. Node embeddings
- Setup
- Assume that we have graph G
 - V is the vertext set
 - A is the binary adjacency matrix
 - For simplicity, assume that there is no additional information
 
 - Goal: encode nodes so that similarity in the embedding space approximates similarity in the graph 

- embedding space의 vector의 dot product가 original network의 similarity.
 - Original network에서의 similarity는 무엇인가?
 
 - Overall Steps
- Encoder maps from nodes to embeddings
 
- Define a node similarity function
- i.e. a measure of similarity in the original network
 
 
- Define a node similarity function
 - Decoder DEC maps from embeddings to the similarity score in the original network
 
- Optimize the parameters of the encoder so that:
 
- Optimize the parameters of the encoder so that:
 - 이 때, two key components는 Encoder과 Similarity function이다.
 
 - “Shallow” Encoding
- Encoder is just and embedding-lookup 각 node에 대한 embedding이 되는 지를 사전 형식으로
 - matrix: each column is a node embedding
 - vector: indicator where all zeroes except a one in column indicating node v. 

 - Each node is assigned a unizue embedding vector
 - Representative methods:
- 1. DeepWalk
 - 2. Node2Vec
 
 
 - How to define the similarity in the original network?
- Similarity function specifies how the relationships in vector space map to the relationships in the original network.
 
 
5. Framework Summary
- Encoder+Decoder Framework
 - Shallow encoder: embedding lookup
- parameters to optimize: Z which contain node embeddings for all nodes
 
 - Deep encoder: Graph Neural networks
 - Decoder: based on node similarity
 - Objective: Maxmize dot product of node embeddings for node pairs (u,v) that are similar
 
2. Node similarity definition that uses random walks
1.Overview
- unsupervised/self-supervised way of learning node embeddings
 - Task independent
 
2. Formulazing the optimizing problem
- Three components to define 

- 1) Decoder function
 - 2) S[u,v]: A graph based similarity measure between nodes
 - 3) l: A loss function measuring the discrepancy between the decoded similarity values and the true values
 
 
3. Random Walk
- The (random) sequence of points visited by random selection
 - Random walk embeddings
- probability that u and v co-occur on a random walk over the graph
 - 이 확률을 embedding된 space에서의 dot product와 동일하게 하는 것이 목표이다.
 - Steps
- Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R
 
- Optimize embeddings to encode these random walk statistics
 
 
 - 왜 Random walk를 사용하는가?
- Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information
- Idea: if random walk starting from node u visits v with high probability, u and v are similar (high-order multi-hop information)
 
 - Efficiency: Do not need to consider all node pairs while training and need to consider only the pairs that co-occur on random walks
- 특히 network size가 클 때 유용함
 
 
 - Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information
 - Unsupervised feature learning
- Idea: Learn node embeddings such that nearby nodes are close together in the network
 - Given a node u, N_R(u): Neighborhood of u obtained by some random walk strategy R
 
 - Strategies used to run random walks
- Deepwalk: run fixed-length, unbiased random walks from each node
4. Optimization
 
 - Deepwalk: run fixed-length, unbiased random walks from each node
 P_R(v u): probability of visiting v on a random walk starting with u with random walk strategy R - maximize 

 Goal: learn embeddings so that the following holds

- The probability is parameterized by softmax function 

- 0에서 1 사이의 값으로 표현하기 위하여 softmax function을 사용하는 것이다.
 
 - Negative Sampling
- However, the normalization term from the softmax is the culprit
 - Instead of normalizing w.r.t all noes, just normalize against k random ‘negative samples’. 

 - sample k negative nodes each with probability proportional to its degree
 - Two considerations for k
- Higher k gives more robust estimates
 - Higher k corresponds to higher bias on negative events
 - from 5 to 20
 
 
 - Stochastic Gradient descent를 사용하여 optimize를 할 수 있음
 
3. Node2Vec (other random walk method)
1. Overview
- Key observation: Flexible notion of network neighborhood of node u, N_R(u) leads to rich node embeddings
 - develop biased 2nd order random walk strategy R to generate network neighborhood N_R(u)
 - Idea: use flexible, biased random walks that can trade off between local and global views of the network
 
2. Biased Walks
- Two classic strategies
- First choose direct neighbors (BFS) and then choose indirect neighbors (DFS)
- BFS: micro-view of neighborhood
 - DFS: macro-view of neighborhood 

 
 
 - First choose direct neighbors (BFS) and then choose indirect neighbors (DFS)
 - Interpolating BFS and DFS
- Biased fixed-length random walk R that given a node u generates neighborhood N_R(u)
 - Two hyperparameters
- Return parameter p: return back to the previous node
 - In-out parameter q: moving outwards (DFS) vs inwards (BFS) q is the ratio of DFS and BFS
 
 
 
3. Node2Vec algorithm
- Compute random walk probabilities
 
- 2 Simulate r random walks of length l starting from each node u
 - optimize the node2vec objective using stochastic gradient descent
 
- 특징
- Linear time complexity
 - All the 3 steps are individually parallelizable
 
 
4. Other random walk ideas
Different kinds of biased random walks
- Metapath2vec
- based on the node attributes
 - based on learned weights
 
 - Alternative optimization schemes
- LINE: directly optimize based on 1 and 2-hop random walk probabilities
 
 - Network preprocessing technichs
- Struc2vec, HARP: run random walks on modified versions of the original network.
 
 
5. Limitations of shallow embedding methods
1. Limitations
O( V ) parameters are needed - no sharing of parameters between nodes
 - Every node has its unique embedding
 
- transductive
- Cannot generate embeddings for nodes that are not seen during training
 
 - Do not incorporate node features
- Many graphs have features that we can and should leverage.
 
 

