Results on synthetic data
The Xor model outputs the parameters related to the community (u, v, w), the hierarchical structure (s, c) and the node types (sigma) from the observed network data A. When the ground-truth values of (theta) and (sigma) are available, we can measure the performance of the model in recovering the community structure, the ranking of nodes and their type. For these three tasks we consider as performance metrics the cosine similarity (CS), the Pearson’s correlation (PC) and Area Under the Curve (AUC), respectively, between the ground-truth and the inferred values. In the absence of ground-truth, we can indirectly evaluate the model fitness via edge prediction tasks in cross-validation settings where we hide a subset of the matrix A (test set), fit the model in the remaining subset (training set) and test the ability to predict the missing edges (test set).
Performances on synthetic networks for different tasks. The mean value across folds is reported, indicating also the standard deviations. For the edge prediction task, our model is compared with the baseline methods MultiTensor and SpringRank. We vary the proportion of expected nodes with (sigma _i =1), i.e. preferring a hierarchy-based interaction. Only the not regularised version of Xor is shown since it is the most effective in this scenario.
We validate the model on synthetic data generated using the Xor generative model with (N=500,) average degree (langle k rangle =20), (beta =5) and varying the ground-truth value of (mu _{gt} in [0,1]). Specifically, we generate networks with (K=3) communities of equal-size unmixed group membership, a hierarchy with (l=3) leagues, i.e. the scores (left{ s_{i}right} _{i=1}^{N}) are drawn from a mixture of Gaussians with means ({ -4, 0, 4}) and standard deviations ({1, 0.5, 1}); we set (delta _0=0.01). We draw five different independent samples for each set of parameters. The inference algorithm is tested by using 5-fold cross-validation for splitting the data into train and test sets, and ran with five different random initializations on each graph instance. We set the hyperparameters K and (beta) equal to the ground-truth values, while we use grid search for selecting the best value of the regularization (lambda = lambda _v = lambda _u = lambda _w times 0.1).
We find that Xor predicts missing edges robustly and consistently across different values of (mu _{gt}) and better than baseline models that consider only community structure (MT) or only hierarchical structure (SR), see Fig. 2 (left). The performance is not monotonic in (mu _{gt}): we obtain high values of AUC when (mu _{gt}) is close to 0 or when (mu _{gt}) is close to 1. These are extreme scenarios where a large majority of the nodes are predominantly interacting either via communities or hierarchy. As one of these two mechanisms dominates, it is also easier to infer the parameters, hence the higher AUC values. The intermediate region (0.2 le mu _{gt}le 0.8) where the nodes distribute more evenly between the two mechanisms corresponds to cases in which inference is harder. Nevertheless, the model shows stable performance in this range, with AUC mean values never dropping below 0.7 and always comparable or better than MT, the best performing among the two baselines. A similar non-monotonic behaviour is observed for the node classification task, where we aim at predicting the node type (sigma) using Q. While we observe a similar performance drop in the same intermediate regime, performance is robust, as the average AUC is always higher than 0.85. Finally, the performance in recovering communities is good in the regime that most favours community structure ((mu _{gt}<0.4)) while recovering of the hierarchy is poor, and vice-versa in the opposite regime ((mu _{gt}0.6)). This is intuitive, as when most of the nodes predominantly interact via communities, their score is irrelevant, and therefore cannot be recovered well. What matters is the ability to recover the structure corresponding to the main mechanism at play, i.e. high cosine similarity when (mu _{gt}<0.5) or high Pearson’s coefficient when (mu _{gt}0.6). We find that Xor performs well in this task, with a boost in performance in recovering the ranking to values above 0.7 in the regime where hierarchy dominates. Similarly, cosine similarity increases above 0.7 when communities dominate. These results suggest that the model is not only able to predict missing edges and nodes’ type, but also to distinguish the node-level latent features that determine how nodes interact, regardless of their type.
Results on real data
Application on High School network
Application on High School network: (a), (b) comparison between the communities detected by MultiTensor and Xor, (c) node types Q visualization and (d) hierarchy in the subnetwork of nodes preferring the competitive mechanism. Nodes’ positions are assigned: (a)–(c) using a spring layout and (d) using the scores s inferred by the Xor algorithm. Communities are selected by normalizing the v membership vector (similar results are obtained with u) and colors of the pie markers are assigned according to the community (mixed) membership in each group. The dark grey nodes have null membership vector. Both the algorithms select (K=4) as optimal number of communities, as output using a five-fold cross-validation scheme and grid search.
As a first example of application of this model, we consider a dataset of a network of high school students29. This describes the perceived interactions of a group of 67 high school students. Each student is being asked “who are you friend of” in the fall of 1957 and the spring of 1958, and the answers are aggregated on the same edge allowing weights with values 1 or 2. Agreement in the response is not ensured, hence the network is directed. It is reasonable to expect that students belong to groups (communities) and this influences the answers they give. However, a fraction of the students may not belong to any group and instead nominate others based on their perceived ranking of the students. For instance, one may nominate whom they aspire to befriend. Figure 3 shows what happens when we apply Xor to this dataset. The figure shows the estimates of node type (sigma _{i}), describing the preferred behavior of each student. As we can see, most of the students have a (sigma _1approx 0), their nominations follow a community structure. However, we obtain four individuals with high (sigma _i), meaning that their preferences are mainly based on ranking. Even though they have a degree similar to that of other students, they are not well connected with the rest of the network as they mostly interact among themselves (there are only 5 other students that nominate one of them as friends). Comparing the reciprocity coefficient for the whole set of students with the one for the subgraph interacting mainly via hierarchy, we find that it increases from a value of 0.51 to a 0.88. In fact, this smaller network is missing only three edges for being a (directed) clique. Considering the direction and weight of the edges of the subgraph made of these four individuals, the ranking is meaningful as it reveals insights on the group social dynamics, as shown in Fig. 3d, where the hierarchy highlights consistency between inferred score (the position of the nodes) and the weight and direction of the interactions. Namely, the individual with highest ranking in that subset ((i=41)), is also the individual that makes fewer friendship nominations, while the others tend to nominate them more often.
In the same figure, we show the community structure inferred by our model and that using MultiTensor, which considers only this mechanism for edge generation. As we can see, Xor outputs slightly different communities, as the yellow and blue nodes are partially mixed in the two cases. What is interesting, is that most of the nodes that have (sigma _i=1) are not assigned to any community: again, we observe that the latent variable…
0.5)0.4)
Read More: The interplay between ranking and communities in networks – Scientific Reports