[그래프데이터분석및응용] Graph Neural networks


해당 포스팅은 [연세대학교 21-1 UT 세미나 그래프데이터분석과응용]을 기반으로 작성되었습니다.

1. Graph Neural Networks

1. Deep Graph Encoders

  • multiple layers of non-linear transformations based on graph structure
  • Output: node embeddings, subgraph or graph embeddings
  • Difficulties
    • No fixed node ordering or reference point
    • Dynamic and have multimodal features
  • Local network neiborhoods
  • Stacking multiple layers

2. Setup

  • V: vertex set
  • A: adjacency matrix (assume binary)
  • X: matrix of node features
  • v: node in V; N(v): the set of neighbors of v
  • Node features
    • Social network의 경우 user profile, user image
    • biologial networks: gene expression profiles, gene function information
    • When there is no node feature in the graph dataset
      • Indicator vectors (one-hot encoding of a node)
      • Vector of constant 1: [1,1,…,1]

3. Naive approach

  • Join adjacency matrix and features
  • Feed them into a deep neural net
    • Issues with this idea
      • O(v) parameters
      • Not applicable to graphs of different sizes
        input만 달라져도 새로운 신경만을 학습을 시켜줘야 한다.
      • Sensitive to node ordering

4. Convolutional networks

  • Goal: to generate convolutions beyond simple lattices
  • Leverage node features/attributes
  • From Images to graphs
    • Transform information at the neighbors and combine it
      • Transform ‘messages’ h(i) from neighbors: W(i)*h(i)
      • Add them up

5. Graph Convolutional networks

  • Nodes’ neighborhood defines a computation graph
  • Learn how to propagate information across the graph to compute node features
  • Local network neighborhoodsd
    • Aggregate neighbors
    • generate node embeddings based on local network neighborhoods
    • Every node defines a computation graph based on its neighborhood image
  • Many layers
    • Models can be of arbitrary depth
      • Nodes have embeddings at each layer
      • Layer 0 embedding of node u is its input features, x_u.
      • Layer k embedding gets information from nodes that are k hops away.
  • Neighborhood Aggregation
    • Key distinctions are in how different approaches aggregate information across the layers
    • Average information from neighbors -> apply a NN image
  • Model parameters image
  • Matrix Formulation image
    • not all graphs can be represented in the matrix form
  • How to train GNN
    • Node embedding is a function of input graph
    • Supervised Setting: minimize the loss
      • Directly train the model for the supervised task image
    • Unsupervised setting
      • use the graph structure as the supervision
      • Similar nodes have similar embeddings image
      • node similarity can be anything including random walks or node proximity in graph
  • 가장 큰 장점: Inductive capability
    • Generate embeddings for nodes as neede and even for nodes we never trained on.
    • The same aggregation parameters are shared for all nodes
    • The number of model parameters in sublinear invand can generalize to unseen nodes
    • Number of parameters to train does not differ by the node.

6. General perspectives on GNN

  • GNN layer=message+aggregation
    • GCN, GraphSAGE, GAT: how to design the message and define the function to aggregate the messages
  • A single GNN Layer
    • Compress a set of vectors into a single vector
    • Two step process
      • Message
      • Aggregation
  • 1) Message Computation
    • Message function image
    • Intuition: Each node will create a message, which will be sent to other nodes later
  • 2) Aggregation
    • Intuition: each node will aggregate the message from node v’s neighbors image
  • Issue
    • information from node v itself could get lost
    • computation of h_v(l) does not directly depend on h_v(l-1)
    • solution: include h_v(l-1) when computing h_v(l)
    • Message에서의 해결방법: compute message from node v itself
      • a different message computation will be performed
    • Aggregation의 해결방법: after aggregating from neighbors, we can aggregate the message from node v itself via concatenation or summation

#### 7. Classical GNN Layers: GCN * GCN image * Message image * Aggregation: sum over message from neighbors, then apply activation function. image

8. GraphSAGE

  • GraphSAGE (Sample and aggreGatE) image
  • Message is computed within the AGG.
  • Two-stage aggregation
    • Stage1: Aggregate from node neighbors image
    • Stage2: Further aggregate over the node itself image
  • Neighbor Aggregation
    • Mean: take a weighted average of neighbors image
    • Pool: transform neighbor vectors and apply symmetric vector function Mean, Max, sum, or std. image
    • LSTM
  • L2 Normalization
    • Apply l2 normalization to h_v(l) at every layer: 크기가 1인 벡터로 정규화 image
    • Without l2 normalization, the embedding vectors have different scales for vectors
  • Neighborhood sampling
    • Randomly sample a node’s neighborhood for message passing
    • When we compute the embeddings, we can sample different neighbors
    • greatly reduces computational cost and avoids overfitting
  • GCN이나 GraphSAGE의 경우 이웃들에게 공통된 weight을 부여함
    • All neighbors are almost equally important to node v

9. Graph Attention Networks (GAT)

  • Attention weights를 부여 image
  • Not all node’s neighbors are equally important and attention focuses on the important parts of the input data and fades out the rest
  • Idea: more computing power on that small but important part of the data
  • Goal: Specify arbitrary importance to different neighbors of each node in the graph
  • How?
    • Compute embedding of each node in the graph following an attention strategy
  • Attention mechanism
    • Let a compute attention coefficients e_vu across pairs of nodes u, v based on their messages image
    • indicates the importance of u’s message to node v
    • Normalize e_vu into the final attention weight
    • Use the softmax function, so that image image
    • Weighted sum based on the final attention weight image
    • What is the form of attention mechanism a?
      • agnostic to the choice of a
      • use a simple single-layer NN as a and a has trainable parameters
      • Parameters of a are trained jointly with weight matrices in an end-to-end fashion
  • Multihead attention
    • Allows model to capture different version of attention
    • Stabilizes the learning process of attention mechanism
    • Create multiple attention scores (each replica with a different set of parameters) image
    • Outputs are aggregated by concatenation or summation.

10. Oversmoothing problem in GNN

  • Overview
    • The issue of stacking many GNN layers
    • All node embeddings converge to the same value
  • Receptive Field
    • the set of nodes that determine the embedding of the node of interest
    • In the k-layer GNN, each node has receptive field of k-hop neighborhood
    • Receptive field overlap for the two nodes and the shared neighbors quickly grows when we increase the number of hops (in other words: number of GNN layers)
    • If two nodes have highly-overlapped receptive fields, then thier embeddings are highly similar
    • If GNN layers are many-> node embeddings will be highly similar and suffer from the over-smoothing problem
  • Design GNN laer connectivity
    • Lesson1: Be cautious when adding GNN layers
      • Step 1: Analyze the necessary receptive field to solve your problem
      • Step 2: Set number of GNN layers to be a bit more than the receptive field we like
      • Do not set L unnecessarily high
  • Expressive power for shallow GNN
    • How to make a shallow GNN more expressive
    • Idea1
      • Increase the expressive power within each GNN layers
      • Make aggregation/transformation become a deep NN.
    • Idea 2
      • A GNN does not necessarily only contain GNN layers
      • We can add MLP layers before and after GNN layers, as pre- and post-process layers image





© 2021.07. by SHINEEUN

Powered by shineeun