In Ooi et al., the authors present a compelling and comprehensive cost effectiveness analysis to optimize the balance between scan duration and sample size for brain-behavior prediction studies.1 This paper follows prior work from Marek et al., which suggested that a sample size of at least 2,000 subjects is required for robust brain-wide associations, given realistic effect sizes of r=0.1.2 Ooi et al. find that sample size and scan duration are broadly interchangeable and therefore sample size requirements can be partly mitigated with longer functional magnetic resonance imaging (fMRI) scans. The goal of this perspective is to summarize the core takeaways from Ooi et al. and discuss some nuances and extensions of this important study.

The core recommendation for most neuroimaging studies resulting from the analyses in Ooi et al. is a duration of at least 30 minutes per subject for fMRI scans.1 This recommendation is longer than several prior studies that focused on reliability3–5 and in-line with more recent estimates.6 Importantly, Ooi et al. move beyond reliability by quantifying the percentage of multivariate prediction accuracy achievable as a function of scan duration out of the ‘maximum’ prediction accuracy based on the full available scan time (focusing on phenotypes with maximum accuracy r>0.1). Here, Table 1 provides a summary of the general recommendation for balancing per-subject scan duration and sample size and scenarios that warrant prioritizing longer per-subject scans (left) or larger sample sizes (right).

Table 1.Overview of recommendations.
Typical study recommendation: 30-minute fMRI scans (ideally with N=900+)
When to prioritize longer scans: When to prioritize larger sample size:
Recruiting from rare (sub-) population Prospective studies (e.g., ABCD, UK Biobank)
Mitigating low quality data points (e.g., in children)7 High patient heterogeneity
Non-stationarity or dynamics studies Interest in subtype/clustering analyses
Precision studies External validation usage8
Interest in subcortical brain regions Normative studies

Why are longer scans helpful?

A key advantage of longer fMRI scans is the resulting increase in temporal degrees of freedom, which is known to improve the reliability of resting state measures of interest including partial correlation9 and subject-specific resting state network maps derived from independent component analysis (ICA) and dual regression.10 More generally, any analysis pipeline that includes a within-subject multiple linear regression with a relatively large number of regressors (such as the number of brain regions in partial correlation analysis or the number of ICA maps in dual regression) benefits from increased temporal degrees of freedom as achieved by longer scans and/or faster repetition times. As such, longer scan times improve the statistical power and thereby enhance the accuracy of within-subject estimates of functional connectivity.10

Do longer scans impact external validation?

Ooi et al. focus on the size of the training sample in their cost-benefit analysis of sample size versus scan duration. In addition to well-powered training data and traditional cross-validation techniques,11 there is increasing awareness of the need to replicate prediction results in out-of-sample and out-of-distribution data.12 Importantly, recent work emphasized the importance of the sample size of datasets used for such external validation.8 Rosenblatt et al. replicated the finding from Ooi et al. that longer scans improved prediction accuracy for training and internal validation data, but longer scan duration did not improve performance for external validation data. As such, large cohorts play an important role in the external validation of predictive models.

Can longer scans help diversity and representation?

There is increasing awareness of the need for representation of diverse and intersectional sociodemographic identities in neuroimaging studies.13,14 An appealing suggestion from Ooi et al. is that underrepresented subpopulations (e.g., racial minorities) could be scanned for longer than overrepresented groups within the same cohort to achieve diverse representation.1 However, preliminary work has reported higher inter-individual variability in neuroimaging and behavior among Black participants compared to White participants,15 which may not be captured in a longer-scan study design without adding more participants. Therefore, future work is needed to understand the trade-offs between mixed scan-durations within-study, recruitment strategies and costs, sampling schemes, and other approaches to improve the diverse representation in neuroimaging datasets.

How do longer scans and/or bigger samples inform individual difference research?

The study by Ooi et al. focuses on cost effectiveness of fMRI studies. As noted by the authors, there are study design choices that intrinsically require a (much) larger sample size such as prospective epidemiological studies where only a small proportion of participants is expected to endorse any particular symptom or behavior of interest.16 On the other hand, some study design choices require a (much) longer scan duration such as precision studies.17 Intriguingly, these two extremes of study design choices share a common goal of mapping individual differences in symptoms or behaviors onto individualized neural patterns.18

A key challenge for all individual difference research is that the degree of variance in brain-behavior associations may differ between groups of individuals, leading to idiosyncrasies that are challenging to capture. For example, normative modeling approaches have shown a lack of overlap in normative deviations across patients,19–21 suggesting particularly high variance. Similarly, the so-called ‘Anna Karenina effect’ suggests the presence of substantially greater variance among patients than among healthy controls.22,23 Although scanning fewer people for longer may maximize cost effectiveness, scanning more participants enables a comprehensive mapping of the full breadth of idiosyncrasies across individuals. Alternatively, precision studies offer an opportunity for in-depth mapping of intricate idiosyncrasies in a limited set of individuals.

One popular approach for parsing high degrees of heterogeneity within a patient group is to assume the presence of biological subtypes (or ‘biotypes’).24 A broad range of data-driven clustering approaches have been developed to identify biotypes from neuroimaging and/or clinical data.25 Although all clustering methods will return a solution that minimizes the cost function, it is more challenging to determine whether resulting biotypes are clinically and/or etiologically meaningful.26,27 To assess biotype validity, post-hoc comparisons between biotypes are often performed to assess differences in treatment effects or other clinically or etiologically relevant variables that were not used for biotype determination. Notably, depending on the inclusion/exclusion criteria, cluster dimensionality, and validation measure missingness, these post-hoc analyses can suffer from relatively low power even when starting from a large sample size.28 As such, cohorts with larger sample sizes are needed to robustly study heterogeneity and subtype validity.

Where does behavior fit in?

Even with 30-minute scans, Ooi et al. show that N=900 is required to achieve 80% of the maximum prediction accuracy and N=2,500 is required to achieve 90% of the maximum prediction accuracy. As such, there is a need to explore additional avenues for optimizing brain-behavior associations. Ooi et al. show that phenotypic reliability affects the overall prediction accuracy (i.e., effect size) but was not related to the tradeoff between sample size and scan duration.1 This is expected because the overall accuracy is capped by the joint reliability of brain and phenotypic measures,29,30 yet scan time is independent of the out-of-scanner phenotypic measures and therefore only impacts the reliability of brain measures. As an extension of the work by Ooi et al., it is likely that brain-behavior association effect sizes can be improved by predicting the averaged phenotypic measure across several repeats instead of a one-off phenotypic assessment.31 Analogous to longer scans, repeated phenotypic measures are expected to enhance the accuracy of within-subject estimates of phenotypic traits. Combined with long scans, this has the potential to raise the joint reliability and thereby increase brain-behavior association effect sizes. Future work is needed to explore equivalent strategies to longer scans in the phenotypic space and its contributions to improving maximum prediction accuracy.

Conclusion

In conclusion, the paper by Ooi et al. provides an important framework and online calculator to maximize the cost efficiency of fMRI studies by optimizing the balance between scan duration and sample size. The recommendation of 30-minute long fMRI scans is suitable for many studies. At the same time, there are valid reasons for prioritizing longer scans (e.g., precision studies) or for prioritizing more participants (e.g., prospective studies, external validation, and subtyping studies). In an era of declining public trust in higher education and increasing pressures on research funding, objective efforts to maximize returns on investment such as the study by Ooi et al. are critically important to guide study design and funding decisions, improve transparency, and inform advocacy.


Funding Sources

Janine Bijsterbosch was supported by the NIH (NIMH R01 MH128286 & NIMH R01 MH132962).

Conflicts of Interest

Janine Bijsterbosch reports no conflicts of interest.