[그래프데이터분석및응용] 네트워크의 Motifs, graphlets and structural roles
해당 포스팅은 [연세대학교 21-1 UT 세미나 그래프데이터분석과응용]을 기반으로 작성되었습니다.
1. Motifs
- subgraphs
- building blocks of networks: discriminate or characterize networks
 - consider all possible (non-isomorphic) directed subgraphs of size 3
 
 - graph isomorphism
- Graph G와 H는 f:V(G) -> V(H)가 존재하면 isomorphic이라고 한다
 - any two nodes u and v of G are adjacent in G if f(u) and f(v) are adjacent in H
 
 - Network significance profile
- A feature vector with values for all subgraph types
 - 같은 network의 도메인들은 similar significant profiles를 가진다.
 
 - motifs
- network가 어떻게 작동하는 지 이해할 수 있게 보조하는 기능
 - 예시
- Feed-forward loops: 신경망에서 주로 많이 발견이 됨
 - Parallel loops: food webs
 - Single-input modules: gene control networks 
 
 
 * induced subgraph of interest라고 한다. 
 * Allow overlapping of motifs * 특정 network가 유의한지를 보기 위해서, random graph보다 motif가 많은지를 확인하고자 한다. $N_ireal$은 number of subgraphs of type i in network $G_real$ $N_irand$은 number of subgraphs of type i in the randomized network $G_rand$ * network significance profile (SP) 
 
 - Generation of random graphs
- generate a random graph with a given degree sequence k1, k2, …, kn
 - Useful as a ‘null’ model of networks
 - Configuration model 

 
- Configuration model 
 - Switching
 
start from a given graph G and repeat the switching step Q* E times select a pair of edges A-B, C-D at random
exchange the endpoints to give A-D, C-D
exchange edges only if no multiple edges or self-edges are generated
A randomly rewired graph as a result
Q is chosen large enough for the process to converge
100 정도..?
 - Variations of the Motif Concept
- Canonical definition
- directed and undirected
 - colored and uncolored
 - temporal and static motifs
 
 - Variations on the concept
- Different frequency concepts
 - Different significance metrics
 - Under-representation (anti-motifs)
 - Different constraints for null model
 
 
 - Canonical definition
 
2. Graphlets
개념
Connected non-isomorphic subgraphs
Graphlet Degree Vector
Used to obtain a node-level subgraph metric
Degree : the number of edges that a node touches *
Generalized notion of ‘degree’ for graphlets
the number of graphlets that a node touches
Automorphism orbit
takes symmetries of a subgraph into account
고유한 orbit이 무엇인가?
Graphlet Degree vector의 재정의
A vector with the frequency of the node in each orbit position

Finding Motifs and Graphs
- Network-centric approaches
- Enumerating: all k-sized connected subgraphs
 - Counting: the number of occurrences of each subgraph type
 
 - Computation time grows exponentially as the size of the motif/graphlet increases
 - Algorithms
- Exact subgraph enumeration (ESU)
 - Kavosh
 - Subgraph sampling
 
 
ESU algorithm
- node v에서 시작하여 nodes u를 V_extension set에 아래의 2가지 특성을 가진다면 추가한다
- u의 node_id가 v의 node_id보다 더 커야한다
 - u는 새로운 node의 w의 이웃일 수도 있지만 V_subgraph에 이미 있는 node와의 이웃은 아니어야 한다.
 
 - recursive function의 일종이며, tree-like structure of depth k (ESU-Tree) 

 - After enumeration of all k-sized subgraphs, the graphs should be counted
 
- node v에서 시작하여 nodes u를 V_extension set에 아래의 2가지 특성을 가진다면 추가한다
 
3. Structural Roles
Roles
measured by structural behaviors
A group of nodes with similar structural properties
Cf. Communities/groups: A group of nodes that are well-connected to each other
- Roles and communities are complementary
 
Structural Equivalence
Nodes u and v are structurally equivalent if they have the same relationships to all other nodes
왜 중요한가?
- 변화나, outlier, identity resolution
 
Structural Role Discovery Method
- RoIX: Authomatic discovery of nodes’ structural roles in networks (Henderson et al., 2011)
 - Neighborhood features
- Node’s connectivity pattern
 - Base set of a node’s neighborhood features
- Local features: all measures of the node degree
 - Egonet features: the number of within-egonet edges, edges entering or leaving egonet
 
 
 
- Recursive Feature Extraction
- To what kind of nodes is a node connected?
 - Aggregate features of a node and use them to generate new recursive features
- set of current node features to generate additional features
 - correlation은 발생할 수 있지만 제외할 수 있음
 
 
 - Role Extraction Clustering method를 사용할 수 있다.
 
