These data structures are profoundly simpler than graphs that might have arbitrary size, multimodal features, complex topology, and no fixed node ordering.
Different complex graph structures and various deep learning and neural network and methods have been proposed to address this. In this article, we’ll cover one of the core deep learning approaches to processing complex graph structures in data: graph convolutional networks.
Let’s get to it.
Before we dig into graph processing, we should talk about the message passing the whole graph itself. Or message passing? Generally, message passing layer that each node in a graph sends information about itself to its neighbors and receives messages from them to update its status and understand its environment.
For example, in a basic label propagation algorithm, every node in the output graph structure starts with some initial state, receives a signal from its neighbors to update it, and then runs the same process in the next iteration but, this time, with an updated starting value. The number of these iterations (timesteps) is a tunable hyperparameter.
Graph neural networks do the same thing, except not just for label values – they spread messages about the entire input data vectors. If label propagation can be compared to simple label smoothing, then what a Graph neural network does can be considered feature smoothing.
How does a graph convolutional network model work?
First, each node gets information about all its connected nodes’ features and applies an aggregation function such as sum or average to these values, which ensures that all representations come out of the same size. Whatever function we end up choosing, it must be permutation and order invariant. This is crucial.
Afterward, the resulting vector is passed through a dense neural network layer (which means it is multiplied by some matrix). Then a non-linear activation function is used on top of that activation function to get a new vector representation.
Next, we keep looping through these three steps:
- For every node, aggregate its neighbors’ vectors (now the updated ones).
- Pass the resulting value through a dense neural net layer.
- Apply non-linearity.
We do this as many times as our network has layers.
It’s important to remember that the nodes in Graph neural networks will have a different representation at each layer. Graph neural networks
- At the 0-th layer, they will be the same as the node’s features.
- At layer k, to compute a representation for a node, we’ll go over its neighbors, take their representations from the previous layer k -1, and average those together. Then we’ll transform them using a parameter matrix and add the node’s messages from k -1. The resulting value will be passed through a non-linear function such as ReLU.
- Finally, after the node’s representation has gone through all the transformations in the hidden layers, we’ll get the final embedding.
There are two tunable parameter matrices that we apply at each node feature matrix and layer – one to transform the values of the two neighboring nodes into neighboring node features, that are represented from k -1, and another to transform the aggregated messages from the adjacent nodes (again from k -1). graph neural networks
During training, we’re figuring out the optimal way to enrich the information a node learns about itself. It boils down to how much of a non-linear transformation we want to do over the node’s feature vectors vs. how much of a modification we need to apply to the neighbors’ feature vectors. Depending on the matrices we use, we can make the node focus entirely on the information about itself and ignore the messages from its environment, the other way around, or a little bit of both, whichever helps us get the best predictions.
Note: the more timesteps we set up, the farther the signal from our nodes will travel. And suppose we’re dealing with an average-sized graph, and the messages do 3 to 5 hops through the edges. In that case, that will be enough for a node to affect the representation of pretty much every other node in the same graph (to a varying extent, that depends on the distance).
If all this sounds familiar, here’s why: In standard convnets, we typically slide a convolutional operator over a pixel grid and process one sub-patch of an image at a time. We combine and transform pixel value information to create new representations, and, in a way, we treat groups of pixels as neighborhoods of nodes. So, overall, the logic behind Graph spatial convolutional networks is highly similar to that of all conventional neural nets: the output of spatial convolutional neural networks used at one layer is fed to the next as input. However, there are some significant distinctions too.
- In Graph convolutional networks, we run the preprocessing step of aggregating the values of a node’s neighbors and averaging them out on every layer.
- The local network neighborhood of a node will define the data structure consisting of its computational Graph, which will be different from that of all other vertices. Therefore, every node will have its neural network architecture that will capture and reflect the structure global context of its environment.