covSTATIS: A multi-table technique for network neuroscience

Giulia Baracchini; Ju-Chi Yu; Jenny R. Rieck; Derek Beaton; Vincent Guillemot; Cheryl L. Grady; Hervé Abdi; Robert N. Spreng

doi:10.52294/001c.124518

Baracchini G, Yu JC, Rieck JR, et al. covSTATIS: A multi-table technique for network neuroscience. Aperture Neuro. 2024;4. doi:10.52294/001c.124518

Download all (2)

Figure 1. CovSTATIS is used to analyze multiple correlation/covariance matrices obtained either within or between individuals. We provide an example using functional connectivity matrices, collected on several individuals, as input to covSTATIS. (1) First, covSTATIS combines all connectivity matrices by quantifying their overall similarity via their R_V coefficients. These coefficients are then stored in the R_V matrix (C). Next, covSTATIS uses the first eigenvector (u₁) of the R_V matrix to derive weights for each connectivity matrix. (2) With these weights, covSTATIS computes the linear combination of all matrices to generate a common space, the compromise, which best represents the connectivity pattern across the sample. (3) The compromise then undergoes eigenvalue decomposition and orthogonal components are extracted to characterize the variance in the whole-sample connectivity pattern. (4) The variables of the compromise (illustrated by different shapes of green dots; i.e., individual brain regions) are represented as global factor scores in the component space. Global factor scores represent the connectivity pattern of each brain region across the entire sample. The same variables from each individual matrix can also be back projected onto the same space as partial factor scores (indicated by points with the same shape of different colors). Partial factor scores represent the connectivity pattern of each brain region for a specific individual. Importantly, the weighted means of all partial factor scores of a given variable equal to their global factor scores (i.e., barycentric property). In this component space, the distance between factor scores provides meaningful and interpretable information about the similarity in the connectivity profile of any two brain regions. The closer the (global or partial) factor scores of two brain regions, the more similar their connectivity profiles.
Download
Figure 2. Top panel: examples of applications of covSTATIS in network neuroscience. Bottom panel: examples of extractable features from covSTATIS. For instance, (A) illustrates how we can extract, from global factor scores, group means of partial factor scores, derive their bootstrap confidence intervals, and use them to interpret group differences in network configurations. (B) demonstrates how we can quantify the overall heterogeneity among all partial factor scores via computing the area of the hull. (C) shows how such heterogeneity can also be evaluated for different groups separately.
Download

View more stats

Abstract

Similarity analyses between multiple correlation or covariance tables constitute the cornerstone of network neuroscience. Here, we introduce covSTATIS, a versatile, linear, unsupervised multi-table method designed to identify structured patterns in multi-table data, and allow for the simultaneous extraction and interpretation of both individual and group-level features. With covSTATIS, multiple similarity tables can now be easily integrated, without requiring a priori data simplification, complex black-box implementations, user-dependent specifications, or supervised frameworks. Applications of covSTATIS, a tutorial with Open Data and source code are provided. CovSTATIS offers a promising avenue for advancing the theoretical and analytic landscape of network neuroscience.

Correlation, covariance and distance matrices are among the most commonly used data types in network neuroscience.^1–4 These matrices are typically built via pairwise comparisons of functional and/or structural neuroimaging data. In these matrices, one entry stores a numerical value quantifying the similarity between two spatial locations (i.e., brain voxels, vertices, regions, channels) and the pattern of these entries reflects an estimate of brain network organization.

In network neuroscience, matrices—also called here data tables—are typically obtained from sets of variables collected on the same individuals (e.g., multiple scans or sessions, multiple imaging modalities),^5–9 or from the same variables collected on different individuals (e.g., one type of imaging scan on several participants).^10–13 Data tables are then compared to one another to assess temporal network structure,^14–17 multi-modal network organization,^4,18,19 individual differences,^20–25 and group or population effects.^26–30 Data tables are also contrasted with one another to investigate the statistical reliability of patterns derived from network neuroscience methods.^31,32 Similarity analyses among multiple data tables thus constitute the cornerstone of network neuroscience research.

In network neuroscience, similarity analyses are most often conducted via mass univariate approaches contrasting one edge at a time, or on aggregate information within data tables (e.g., graph theory analyses³³) or across (e.g., categorical groupings), or on single tables with reduced dimensions.³⁴ While these approaches reduce the high dimensionality of network neuroscience data and simplify the analytic landscape, they may limit the statistical robustness of network neuroscience findings and obscure important properties of brain function that could be revealed from the use of multivariate methods applied to full data tables. Multivariate approaches able to align and compare relational information from full data tables across multiple observations can augment the utility, precision, and applicability of network neuroscience data and advance our understanding of brain network organization in health and disease.

Statistical methods exist—called multi-table methods^35–43—explicitly designed for multivariate similarity analyses of full data tables. Their goal is to identify structured patterns within preserved high-dimensional multi-table data, and explain and visualize their statistical dependencies. Multi-table methods serve as the basis of network investigations across scientific disciplines,⁴⁴ yet they are not well known in network neuroscience and therefore remain underused.

Network neuroscience presently counts a few multi-table methods, including machine learning and deep learning tools,^45,46 graph neural networks,^47,48 multi-layer and multiplex network approaches,^18,49 similarity network fusion techniques⁵⁰ and non-linear matrix decomposition algorithms.^51–53 These approaches have been applied to a variety of research questions about brain network organization both in health and disease.^54–61 Yet, they often yield complex results challenging to interpret, potentially because these methods rely on complex mathematical implementations^62,63 and supervised analytical frameworks⁵¹ that do not allow results to be traced back to the original data. There is therefore a pressing need in network neuroscience for multi-table methods that preserve data fidelity and enhance interpretability. CovSTATIS solves this problem by analyzing intact data tables in a linear, unsupervised manner, thus allowing for the simultaneous extraction and interpretation of both individual and group-level features.

The covSTATIS method (and its variant DISTATIS) is a three-way extension of multidimensional scaling and Principal Component Analysis.^64–66 The name, covSTATIS, combines “covariance” with “STATIS” (a French acronym for “structuring three-way statistical tables”). CovSTATIS takes as input symmetric, positive semi-definite matrices (i.e. symmetric matrices such as cross-product, covariance and correlation matrices, with non-negative eigenvalues) and assesses their similarity.^67–69 While covSTATIS is specifically designed for correlation/covariance matrices, there exists an equivalent approach for distance matrices called DISTATIS.⁶⁴ CovSTATIS and DISTATIS belong to the STATIS family of multi-table approaches.

In covSTATIS, the pair-wise similarity among I correlation/covariance matrices of dimension J × J (with J being the number of variables) is quantified by the R_V coefficient – a measure analogous to a squared Product-Moment correlation coefficient with values in the interval [0,1].^67,70 Given I correlation/covariance matrices, the R_V coefficients are stored in an I × I similarity matrix C, where the rows/columns correspond to the I data tables (Figure 1, step 1). Next, covSTATIS performs an eigenvalue decomposition (EVD) on the C matrix and takes its first eigenvector (of dimension I × 1) to derive weights for each data table. Note that because the R_V is always positive all the entries for the first eigenvector of C are positive—a consequence of the Perron-Frobenius theorem. The first eigenvector maximally explains the variance in C and quantifies how similar each table is to the common pattern. Weights for each data table are derived by scaling the first eigenvector to sum to 1. Higher weights identify tables that are more similar to the common pattern, whereas lower weights identify tables less similar to the common pattern. These weights are then used to linearly combine the data tables by multiplying each table by its weight and summing across all the weighted tables. This step generates a J × J weighted group matrix—called the compromise matrix (Figure 1, step 2)—which is next decomposed by EVD. Orthogonal components are extracted from this second EVD and serve as the main output of covSTATIS (Figure 1, step 3). Components reveal the similarity between J variables with regards to the compromise—the ensemble of the similarity patterns across all I data tables. For each component, global factor scores of dimension J × L (with L being the number of components) capture the relationship between variables with respect to such compromise. For each component and each of the I correlation/covariance matrices, J × L partial factor scores can be obtained to project table-specific relationships between variables onto the same component space (Figure 1, step 4). By quantifying the deviation of each partial factor score from its corresponding global factor scores, we can assess differences between each table and the compromise.

Figure 1.CovSTATIS is used to analyze multiple correlation/covariance matrices obtained either within or between individuals. We provide an example using functional connectivity matrices, collected on several individuals, as input to covSTATIS. (1) First, covSTATIS combines all connectivity matrices by quantifying their overall similarity via their R_V coefficients. These coefficients are then stored in the R_V matrix (C). Next, covSTATIS uses the first eigenvector (u₁) of the R_V matrix to derive weights for each connectivity matrix. (2) With these weights, covSTATIS computes the linear combination of all matrices to generate a common space, the compromise, which best represents the connectivity pattern across the sample. (3) The compromise then undergoes eigenvalue decomposition and orthogonal components are extracted to characterize the variance in the whole-sample connectivity pattern. (4) The variables of the compromise (illustrated by different shapes of green dots; i.e., individual brain regions) are represented as global factor scores in the component space. Global factor scores represent the connectivity pattern of each brain region across the entire sample. The same variables from each individual matrix can also be back projected onto the same space as partial factor scores (indicated by points with the same shape of different colors). Partial factor scores represent the connectivity pattern of each brain region for a specific individual. Importantly, the weighted means of all partial factor scores of a given variable equal to their global factor scores (i.e., barycentric property). In this component space, the distance between factor scores provides meaningful and interpretable information about the similarity in the connectivity profile of any two brain regions. The closer the (global or partial) factor scores of two brain regions, the more similar their connectivity profiles.

For example, given a covSTATIS analysis of multiple functional connectivity matrices where only positive connectivity values are considered, global factor scores represent the brain regions on the component space and illustrate the associations in their connectivity profiles across the whole sample. Partial factor scores represent the brain regions of each individual’s connectivity matrix and illustrate how regions are associated with each other in relation to the group pattern. Factor scores of any two components can be used as coordinates to draw scatter plots in the component space, where the distance between two scores represents their similarity. Two global factor scores close to each other indicate high similarity in their respective connectivity patterns across the whole sample, while a partial factor score close to its corresponding global factor score represents high similarity between an individual’s regional connectivity profile and the regional profile from the whole sample. In sum, covSTATIS provides an unsupervised, linear framework to characterize the similarity among sets of correlation/covariance matrices, and it allows for a one-to-one mapping between input (i.e., whole set of tables and single tables) and output (i.e., global and partial factor scores). This approach can both identify group-level patterns as well as provide individual-specific expressions of the patterns.

CovSTATIS has both commonalities and differences compared with other dimensionality reduction methods used in neuroscience: (1) Principal Component Analysis (PCA)^71,72 and Multidimensional Scaling (MDS)⁷³—these two techniques incorporate a single data table; PCA is performed on an observation-by-variable table, and MDS on a dissimilarity matrix between observations. They rely on the same dimensionality reduction technique as covSTATIS, but covSTATIS extends MDS by analyzing multiple similarity matrices; (2) Multiple Factor Analysis (MFA)^74,75 and STATIS⁶⁴—these are other component-based methods that incorporate multiple data tables, and they are particularly suitable for rectangular data structures. Such data structures are less common in network neuroscience, making MFA and STATIS less desirable methods than covSTATIS. CovSTATIS combines MDS and STATIS to deal with multiple squared, symmetric, correlation/covariance matrices–the standard format of network neuroscience data; (3) Representation Similarity Analysis (RSA)⁷⁶—this method computes a dissimilarity matrix, akin to the matrix of R_V coefficients in covSTATIS, from a set of correlation/covariance matrices. RSA extracts components from this dissimilarity matrix to characterize differences between the correlation/covariance matrices. These components are the main output of RSA. CovSTATIS, on the other hand, derives weights from these components to compute the compromise—the linear combination of all correlation/covariance matrices—and further extracts components from it. These components are the main outputs of covSTATIS; (4) Graph Theory approaches³³—while used as meaningful statistical descriptors of network neuroscience data, they operate on aggregated pairwise information (e.g., connectivity), not on full data tables; and (5) Gradient Analysis^51–53—this method is the most similar to covSTATIS in terms of functionality and implementation. It offers complementary outputs to covSTATIS, ones that are however further removed from the original data. Gradient Analysis relies on non-linear dimensionality reduction methods of affinity matrices of correlation/covariance profiles. Given that Gradient Analysis does not construct a compromise-like space, each correlation/covariance matrix needs to be projected onto the affinity-derived component space to infer similarity/differences in the data. CovSTATIS, instead, obtains the component space from a direct decomposition of the compromise–a linear combination of the original correlation/covariance matrices. As such, covSTATIS provides a multivariate framework that promotes more interpretable links between the output and the original data.

There are numerous other dimensionality reduction techniques, each having its particular strengths and weaknesses. Though almost any dimensionality reduction approach can be applied to neural data, we believe that covSTATIS is particularly well suited for the specific structure and format of network neuroscience data for its flexibility to linearly handle multiple symmetric correlation/covariance tables.

Applications of covSTATIS in network neuroscience

In neuroimaging, STATIS-based methods have been applied in a limited capacity.^77–83 As such, the potential of covSTATIS as a tool for network neuroscience remains largely untapped. Recent work from our group applied covSTATIS to compare spatial patterns of fMRI connectivity across task states,⁸⁴ and to estimate resting-state fMRI connectivity dynamics.⁸⁵ Here, we guide the reader through the former application and provide a step-by-step tutorial with data⁸⁶ and code. The tutorial can be accessed here: https://giuliabaracc.github.io/covSTATIS_netneuro/pages/tutorial.html.

Briefly, in this application of covSTATIS, we used task-based fMRI functional connectivity data from a healthy adult lifespan sample of 144 individuals, to examine how fMRI-derived functional connectivity reconfigures across task conditions with different cognitive load. Note that in the original paper we had 3 different tasks,⁸⁴ but here and in the tutorial, we focus on only one (i.e., n-back with 0-, 1-, 2-back conditions).

Individual-specific functional connectivity matrices were calculated for each task condition. These tables were weighted based on their respective R_V similarity coefficients, and linearly combined to create the compromise matrix—the optimum weighted average of all connectivity tables across task conditions and individuals. The compromise was then submitted to an eigenvalue decomposition to assess the similarity of the connectivity patterns across task conditions and subjects. Global factor scores represented the average connectivity profile of single brain regions across task conditions and individuals, and partial factor scores represented how the regional connectivity profile for each individual and task condition mapped onto the group average. In sum, covSTATIS allowed us to estimate (1) regional differences in functional connectivity patterns across all tasks and individuals, and (2) individual differences in functional connectivity patterns across task conditions. A more detailed breakdown of all steps involved in covSTATIS can be found in our tutorial, including guidelines in the choice of covSTATIS’ parameters based on the type of input data.

Other examples of potential applications of covSTATIS include investigations of individual and group differences in spatial and/or temporal network structure in health and disease (Figure 2, bottom panel), deep phenotyping of connectivity metrics, and multimodal assessments of network measures within and across individuals (Figure 2, top panel). Another promising avenue for covSTATIS is the exploration of brain-behavior relationships within a single framework. Through covSTATIS, participants’ correlations—computed from high dimensional brain and behavioral data tables—can be integrated in a unified compromise space from which the shared variance between tables can be extracted.

Figure 2.Top panel: examples of applications of covSTATIS in network neuroscience. Bottom panel: examples of extractable features from covSTATIS. For instance, (A) illustrates how we can extract, from global factor scores, group means of partial factor scores, derive their bootstrap confidence intervals, and use them to interpret group differences in network configurations. (B) demonstrates how we can quantify the overall heterogeneity among all partial factor scores via computing the area of the hull. (C) shows how such heterogeneity can also be evaluated for different groups separately.

Importantly, as covSTATIS only requires symmetric positive semi-definite matrices, it provides a general framework to assess shared and distinct patterns across many type of similarity matrices, beyond functional connectivity. With proper matrix preprocessing steps (e.g., taking the Laplacian of a connectivity graph), covSTATIS can analyze other network matrices, such as structural connectivity, similarly to other methods (e.g., network portrait divergence,⁸⁷ netsimile,⁸⁸ and others, see this paper⁸⁹ for a comprehensive list).

While not exhaustive, these applications of covSTATIS highlight the versatility of the method. We hope that the network neuroscience community will benefit from covSTATIS and collectively further refine and expand the approach. Ongoing developments of covSTATIS can be found on our website: https://giuliabaracc.github.io/covSTATIS_netneuro/.

Conclusions

CovSTATIS serves as a theoretically and computationally accessible tool for similarity analyses, capable of preserving and integrating high dimensional, complex multi-table data typical of network neuroscience. Its linear, unsupervised, user-independent implementation makes covSTATIS a more interpretable and versatile tool compared to other commonly used approaches, ultimately paving the path for new discoveries in network neuroscience.

Methods

Notations

A matrix is denoted by a bold, uppercase letter (e.g., X), a vector is denoted by a bold, lowercase letter (e.g., x), and an element of a matrix is denoted by a lowercase italic letter (e.g., x). The cardinal of a set is denoted by an uppercase italic letter (e.g., I). Given I data tables, we used the subscript i to identify individual data tables (e.g., $\mathbf{X}_i$ ). The boldface capital letter I denotes the identity matrix. The transpose of a matrix is denoted by the superscript ^T (e.g., X^T).

The jth column of matrix X is denoted by $\mathbf{x}_j$ , and the value on the kth row and the jth column is denoted by $\mathbf{X}_{i, j}$ . For an I × J matrix, the minimum of I and J is the largest possible rank, denoted by L, of X. The trace(X) operator gives the sum of the diagonal elements of the square matrix X.

CovSTATIS

To generate the compromise space that best represents the common pattern across all data tables (e.g., correlation/covariance matrices), covSTATIS first derives weights from a pairwise similarity matrix, called the R_V matrix, which quantifies the similarity between data tables via the R_V coefficient. Formally, given two J × J data tables $\mathbf{X}_i$ and $\mathbf{X}_{i'}$ (e.g., two connectivity matrices with J ROIs from the 2 observations i and i’, e.g., participants or tasks), the R_V coefficient between these two matrices is computed as:

$\begin{aligned} R_V &= \frac{\operatorname{trace}\left(\mathbf{X}_i^{\top} \mathbf{X}_{i^{\prime}}\right)}{\sqrt{\operatorname{trace}\left(\mathbf{X}_i^{\top} \mathbf{X}_i\right) \operatorname{trace}\left(\mathbf{X}_{i^{\prime}}^{\top} \mathbf{X}_{i^{\prime}}\right)}}\\ & =\frac{\sum_j^J \sum_{j^{\prime}}^J x_{j, j^{\prime}, i} x_{j, j^{\prime}, i^{\prime}}}{\sqrt{\left(\sum_j^J \sum_{j^{\prime}}^J x_{j, j^{\prime}, i}^2\right)\left(\sum_j^J \sum_{j^{\prime}}^J x_{j, j^{\prime}, i^{\prime}}^2\right)}} \end{aligned}\tag{1}$

Akin to a squared correlation coefficient, the R_V coefficient takes values in the interval [0 1]. The R_V coefficients between all matrices are then stored in an $I \times I \ \mathbf{R} \mathbf{v}$ matrix, denoted by C, where the cell $\mathbf{c}_{i, i'}$ stores the value of the R_V coefficient between $\mathbf{X}_i$ and $\mathbf{X}_{i'}$ . As C stores the similarity between data tables, the first component of C best represents the common pattern across tables, and the first eigenvector of C (u₁) quantifies how similar each table is to this common pattern. As a result, to build the compromise space, weights for each data table are derived by u₁, rescaled to sum to 1. Formally, C undergoes an eigenvalue decomposition (EVD):

$\mathbf{C}=\mathbf{U} \boldsymbol{\Omega} \mathbf{U}^{\top} \text { such that } \mathbf{U}^{\top} \mathbf{U}=\mathbf{I} \text {, } \tag{2}$

where Ω is an R × R diagonal matrix of eigenvalues of C with R denoting the rank of C, and U is a I × R matrix of eigenvectors of C. Next, the weights of $\mathbf{X}_i$ (denoted by α_i) are obtained as:

$\alpha_i=\sqrt{\frac{u_{i, 1}}{\sum_{i=1}^I u_{i, 1}}}=\left(u_{i, 1}\left(\mathbf{1}^{\top} \mathbf{u}_1\right)^{-1}\right)^{\frac{1}{2}}, \tag{3}$

where u_i,1 is the ith value of u₁, a value that corresponds to $\mathbf{X}_i$ . The compromise (X₊) is then computed as the weighted sum of all data matrices, where

$\mathbf{X}_{+}=\sum_{i=1}^I \alpha_i \mathbf{X}_i , \tag{4}$

and decomposed by an EVD:

$\mathbf{X}_{+}=\mathbf{P} \boldsymbol{\Lambda} \mathbf{P}^{\top} \text { such that } \mathbf{P}^{\top} \mathbf{P}=\mathbf{I}, \tag{5}$

where Λ is an L × L diagonal matrix of eigenvalues of X₊ with L denoting the rank of X₊, and P is a J × L matrix of eigenvectors of X₊. From EVD, the global factor scores F (i.e., factor scores from the compromise) are computed as:

$\mathbf{F}=\mathbf{X}_{+} \mathbf{P} \boldsymbol{\Lambda}^{-\frac{1}{2}} \tag{6}$

and the partial factor scores $\mathbf{F}_i$ (i.e., the factor scores derived from the projection of individual tables onto the compromise) are computed as:

$\mathbf{F}_i=\mathbf{X}_i \mathbf{P} \boldsymbol{\Lambda}^{-\frac{1}{2}} .\tag{7}$

It is worth noting that these partial factor scores have a barycentric property, that is their weighted sums equate to the global factor scores:

$\mathbf{F}=\sum_i^I \alpha_i \mathbf{F}_i . \tag{8}$

The optimization problem in covSTATIS

Optimization in covSTATIS is a two-part problem. The first part is akin to the optimization problem of principal component analysis, the second part is the same as the optimization problem of eigenvalue decomposition.

First part

First, weights for each data table are obtained to compute the compromise, such that the similarity between the compromise and all input matrices is maximal. Second, components are computed that best explain the compromise’s inertia (i.e., variance in more than two dimensions). Formally, the first optimization problem can be written as the following maximization problem:

$\underset{\boldsymbol{\alpha}}{\arg \max } \sum_{i=1}^I\langle\mathbf{X}_i, \underbrace{\sum_{i^{\prime}}^I \alpha_{i^{\prime}} \mathbf{X}_{i^{\prime}}}_{\mathbf{X}_{+}}\rangle^2 \quad \text { with } \quad \boldsymbol{\alpha}^{\top} \boldsymbol{\alpha}=1, \tag{9}$

and the sum of squared scalar products can be developed and simplified:

$\begin{aligned} \sum_{i=1}^I\left\langle\mathbf{X}_i, \mathbf{X}_{+}\right\rangle^2 & =\sum_{i=1}^I \sum_{i^{\prime}=1}^I \alpha_i \alpha_{i^{\prime}} \sum_{i^{\prime \prime}=1}^I \operatorname{trace}\left\{\mathbf{X}_i \mathbf{X}_{i^{\prime \prime}}\right\}\\ & \quad \times \operatorname{trace}\left\{\mathbf{X}_{i^{\prime \prime}} \mathbf{X}_{i^{\prime}}\right\} \\ & =\sum_{i=1}^I \sum_{i^{\prime}=1}^I \boldsymbol{\alpha}_i \boldsymbol{\alpha}_{i^{\prime}} \sum_{i^{\prime \prime}=1}^I c_{i, i^{\prime \prime}} \times c_{i^{\prime \prime}, i^{\prime}} \\ & =\boldsymbol{\alpha}^{\top} \mathbf{C}^2 \boldsymbol{\alpha} \end{aligned} \tag{10}$

Therefore, the solution of the optimization problem defined by Equation 9 is the first eigenvector of C², which is the same as the first eigenvector of C. According to the Perron-Frobenius theorem, the elements of α will all have the same sign (chosen as positive). These elements are then scaled to sum to 1 to ensure that partial factor scores will be barycentric for their respective global factor scores (cf. Equation 8). Note that we ignored the denominator of the R_V in the first line as it is a fixed scalar equal to J² and has no effect on the maximization problem.

This optimization problem is similar to the optimization problem of Principal Component Analysis (PCA). In PCA, weights are searched for each variable to compute factor scores—which are computed as linear combinations of these variables. In covSTATIS, weights are searched for each data table to compute the compromise—the linear combination of these data tables.

Second part

The second optimization problem is equivalent to the optimization problem of an eigen-decomposition. In the eigen-decomposition, for each component, weights (i.e., loadings) of each row/column are searched to compute factor scores (F). Factor scores are linear combinations of the loadings that have the largest possible variance (as evaluated by their associated eigenvalues). This optimization problem can be written as the minimization problem of approximating the sum of squared factor scores to the compromise:

$\small{\begin{array}{rll} \underset{\mathbf{F}}{\arg \min }\left\|\mathbf{X}_{+}-\mathbf{F F}^{\top}\right\|_2^2 & \text {such that} & \mathbf{F}^{\top} \mathbf{F}=\boldsymbol{\Lambda} \\ \Leftrightarrow \underset{\mathbf{P}}{\arg \min }\left\|\mathbf{X}_{+}-\mathbf{P} \boldsymbol{\Lambda} \mathbf{P}^{\top}\right\|_2^2 & \text {such that} & \mathbf{P}^{\top} \mathbf{P}=\mathbf{I} . \end{array}} \tag{11}$

Here, P is the matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues.

The detailed proofs and descriptions of the optimization problems of covSTATIS can be found in the Appendix of a previous publication.⁶⁴

CRediT author statement

Giulia Baracchini: Conceptualization, Software, Formal Analysis, Data Curation, Writing – Original Draft, Writing – Review and Editing, Visualization. Ju-Chi Yu: Conceptualization, Software, Formal Analysis, Writing – Original Draft, Writing – Review and Editing. Jenny Rieck: Software, Formal Analysis, Data Curation, Writing – Review and Editing. Derek Beaton: Software, Writing – Review and Editing. Vincent Guillemot: Software, Writing – Review and Editing. Cheryl Grady: Resources, Writing – Review and Editing. Hervé Abdi: Software, Writing – Review and Editing. R. Nathan Spreng: Writing – Review and Editing, Supervision.

Data availability

Data used in the tutorial are available online (https://osf.io/hnj7s/) and are described in detail in our previous publication.⁸⁶

Code availability

The original source code for covSTATIS can be found here: https://cran.r-project.org/web/packages/DistatisR/ and its helper file here: https://cran.r-project.org/web/packages/DistatisR/DistatisR.pdf. All code used to apply covSTATIS in network neuroscience and replicate our tutorial, along with all documentation, can be accessed here: https://giuliabaracc.github.io/covSTATIS_netneuro/pages/tutorial.html. For the tutorial, a downloadable qmd file can be accessed here: https://github.com/giuliabaracc/covSTATIS_netneuro/blob/main/pages/tutorial.qmd. The following GitHub (https://github.com/giuliabaracc/covSTATIS_netneuro/tree/main) and website (https://giuliabaracc.github.io/covSTATIS_netneuro/) links serve as centralized resources for covSTATIS’ applications in network neuroscience.

Funding Sources

This research was supported in part by grants from the Canadian Institutes of Health Research (CIHR) the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Institutes of Health (NIA R01 AG068563). G.B. and R.N.S. are supported in part by Fonds de recherche du Québec (FRQS). J.-C.Y. receives grant support from the Discovery Fund of the Centre for Addiction and Mental Health (CAMH).

Conflicts of Interest

The authors declare no competing interests.

Submitted: June 12, 2024 CDT

Accepted: October 02, 2024 CDT

References

Zalesky A, Fornito A, Bullmore E. On the use of correlation as a measure of network connectivity. NeuroImage. 2012;60:2096-2106. doi:10.1016/j.neuroimage.2012.02.001