The landmark discovery that neutrinos have mass and can change type (or flavour) as they propagate -- a process called neutrino oscillation1,2,3,4,5,6 -- has opened up a rich array of theoretical and experimental questions being actively pursued today. Neutrino oscillation remains the most powerful experimental tool for addressing many of these questions, including whether neutrinos violate charge-parity (CP) symmetry, which has possible connections to the unexplained preponderance of matter over antimatter in the Universe7,8,9,10,11. Oscillation measurements also probe the mass-squared differences between the different neutrino mass states (Δm2), whether there are two light states and a heavier one (normal ordering) or vice versa (inverted ordering), and the structure of neutrino mass and flavour mixing12. Here we carry out the first joint analysis of datasets from NOvA13 and T2K14, the two currently operating long-baseline neutrino oscillation experiments (hundreds of kilometres of neutrino travel distance), taking advantage of our complementary experimental designs and setting new constraints on several neutrino sector parameters. This analysis provides new precision on the mass difference, finding in the normal ordering and in the inverted ordering, as well as a 3σ interval on δCP of [-1.38π, 0.30π] in the normal ordering and [-0.92π, -0.04π] in the inverted ordering. The data show no strong preference for either mass ordering, but notably, if inverted ordering were assumed true within the three-flavour mixing model, then our results would provide evidence of CP symmetry violation in the lepton sector.
The standard model of particle physics, extended to include neutrino mass, describes three-flavour eigenstates of neutrinos (ν, ν, ν) that are related to three mass eigenstates (ν, ν, ν) by the 3 × 3 complex Pontecorvo-Maki-Nakagawa-Sakata unitary mixing matrix U (refs. ). This mixing, together with non-zero neutrino mass, allows for the phenomenon of neutrino oscillation, in which, during propagation, the flavour content of a neutrino evolves at a rate that depends on neutrino mass-squared splittings () and the U matrix elements. Apart from these oscillation parameters, the rate depends on neutrino energy E and neutrino propagation distance L (baseline). Although experiments studying this process in recent decades have provided insights into the details of neutrino masses and mixings, many open questions remain.
The mixing matrix U is typically parameterized in terms of three mixing angles (θ, θ, θ) and at least one complex phase δ (ref. ). It is unknown whether sin δ is non-zero; if it is, neutrinos -- and thus leptons -- violate charge-parity (CP) symmetry and thereby provide a source of matter-antimatter asymmetry in nature, which is of great interest given the connection between CP violation and the unexplained matter dominance in the Universe. Separately, oscillation experiments have established that the mass-squared splitting is roughly 30 times larger in magnitude than , but the sign of the former is at present unknown. That is, ν may be heavier or lighter than the ν-ν pair, with these two options termed, respectively, the normal () and inverted () mass orderings. Knowledge of the mass ordering can constrain experimental searches and theory development in a wide range of physics, including absolute neutrino mass measurements, neutrinoless double beta decay searches to investigate the nature of neutrino mass, models of supernova explosion and detection, and the cosmological evolution evidenced in cosmic microwave background and large-scale structure measurements. For the mixing angles, current data suggest θ is near 45°, a notable value hinting at a μ/τ flavour symmetry. Improved precision on this and other mixing angles is essential for gaining a clearer view of flavour mixing and to probe the validity of the three-flavour model.
Long-baseline accelerator neutrino oscillation experiments are well suited to address the above questions. In these, a high-intensity neutrino beam enriched in muon neutrinos (ν) or muon antineutrinos () is produced at a particle accelerator and directed through the crust of Earth towards a massive far detector located hundreds of kilometres away. Note that the word 'neutrino' is used to mean both neutrino and antineutrino unless stated otherwise. The far detector measures the event rates of ν and ν -- the latter primarily from ν → ν oscillation -- as a function of neutrino energy, from which the oscillation parameters above can be determined. These experiments use near detectors, sited a short distance from the beam source, in which oscillation effects are negligible and a very high neutrino event rate can be measured. The near detectors provide vital control measurements that substantially mitigate large systematic uncertainties in the initial neutrino flux, neutrino-on-nucleus interaction cross-sections and in some cases detector response (for example, energy reconstruction and event selection efficiencies).
Two such experiments are in operation today, T2K and NOvA. Each experiment uses a narrow-band off-axis beam, whose peak energy is near the first oscillation maximum, , at the far detector. Note that natural units, where ħ = c = 1, are used throughout. T2K uses an approximately 0.6 GeV neutrino beam from J-PARC in Tokai, Japan, and the 50-kt Super-Kamiokande water Cherenkov detector for its far detector located 295 km away. In the United States, an approximately 2 GeV beam of NOvA is produced at Fermilab near Chicago, and the 14-kt tracking calorimeter far detector is located 810 km away in northern Minnesota. Further details on the designs of NOvA and T2K and on long-baseline experiments can be found in the Methods and refs. .
We report here a combined analysis of the datasets from T2K and NOvA previously analysed independently in refs. . This combination takes advantage of marked complementarity in the sensitivities of the two experiments to the oscillation parameters. In particular, the ν → ν oscillation probability is a function of (among other things) both δ and the neutrino mass ordering, and these two effects must be teased apart.
Figure 1 shows the complementarity between the experiments in a simplified case. Sets of oval curves indicate the energy-integrated total ν and event counts expected in the far detectors under various mass ordering and δ scenarios, with other oscillation parameters held fixed. The measured event counts in NOvA and T2K are shown as black points with error bars.
As shown in Fig. 1a, there is stronger separation between the mass ordering ovals for NOvA, because of higher beam energies, but as the NOvA data lie near the overlap of the ellipses, there can be ambiguity as to which ordering is correct and (in a correlated way) which values of δ are preferred. By contrast, T2K has less sensitivity to the mass ordering, but points with similar values of δ in each hierarchy sit close to one another, and the data lie closest to , regardless of mass ordering. Combining these datasets can provide simultaneous mass ordering and δ information with substantially less ambiguity, maximizing the use of current data and informing data-taking strategies for current and future experiments.
This discussion points to a more general observation that the oscillation parameters of interest represent a highly correlated multidimensional space. The analysis reported here calculates a joint Bayesian posterior, using the likelihoods of the experiments defined over the full parameter space. Moreover, we use the full suite of analysis tools from both experiments: detector response models, neutrino energy estimators, near-detector measurements and systematic uncertainties, all within a unified framework for statistical inference. This level of integration is the first for accelerator neutrino experiments, to our knowledge.
The posterior calculation is based on detailed parameterized models of the neutrino flux, cross-sections and detectors that predict the binned spectra of neutrino events in each of our selected samples as a function of the oscillation parameters and a large number of nuisance parameters mostly related to systematic uncertainties in the models. A likelihood is constructed from Poisson probability terms describing the compatibility between the prediction and the observed data in bins of relevant variables. Prior probabilities are set on all parameters as detailed in the Methods.
Both T2K and NOvA have software that explores the posterior using Markov chain Monte Carlo (MCMC) methods (ARIA for NOvA and MaCh3 for T2K). By containerizing the likelihood and prior portions of the code, we can construct and analyse the joint posterior using either of the original MCMC frameworks, in spite of the very different software environments involved. For each fitting framework, ARIA or MaCh3, the native likelihood and priors of the fitter are calculated directly, whereas the likelihood and priors of other experiments are accessed using the software container. In this way, either framework can be used, providing valuable redundancy and thus cross-checks of all statistical inferences.
Although a single set of oscillation parameters naturally applies to both experiments in the joint posterior, the treatment of the many nuisance parameters related to systematic uncertainties is more subtle. Both measurements of the oscillation parameters at present have statistical uncertainties larger than the systematic uncertainties, but the latter are not negligible. We thoroughly surveyed the flux, cross-section and detector models and their systematic uncertainties to determine whether correlations between the experiments affect the analysis at a significant level. Our conclusions from this effort are summarized in the following paragraphs.
Both T2K and NOvA use beams produced by directing accelerated protons onto graphite targets. The hadrons are charge-selected with magnetic horns: positively charged hadrons decay to produce neutrinos, and negatively charged hadrons produce antineutrinos. Many uncertainties on these beam fluxes stem from processes unrelated between the two experiments, for example, the alignment of beam components. Yet, uncertainties on the rate of hadron production in the graphite targets are substantial, and the underlying physics is the same. However, the two experiments use proton beams of different energies (30 GeV for T2K and 120 GeV for NOvA), and the external datasets used to tune the hadron production models of both experiments are different. Moreover, the ultimate impact of flux uncertainties on far-detector predictions in NOvA is much smaller than other uncertainties. We, therefore, conclude that at current experimental exposures, the flux uncertainties of the two experiments need not be correlated.
Given the different detector technologies involved, most detector-related uncertainties are independent between the experiments. Furthermore, the very different energy estimation techniques used, combined with model tuning and uncertainty estimation using in situ calibration samples in each experiment, including for the lepton and neutron energy scales, lead to independence between the two detector uncertainty models. We conclude that there are no significant correlations in the detector models.
For neutrino-on-nucleus cross-sections, the underlying physics is the same; in many cases, the same external datasets are used by both experiments to tune and set prior uncertainties on model parameters. Thus, cross-section model correlations are expected. However, in the specific case of NOvA and T2K, the description of this physics differs markedly. The simulation packages differ, the physical models implemented in them differ in many places, the parameterizations differ almost entirely, and customized tunings are necessary and applied, given the specific energies of the experiments, detector technologies and approaches to systematic uncertainty mitigation.
Proper correlations between experiments could be implemented by starting from a common cross-section model spanning different energy ranges and able to describe both the leptonic and hadronic parts of the final state. A joint description is not yet mature and is one of the focuses of the community in the years to come. Given the differences in the models, a direct mapping of their parameters was deemed not practical at this time. Instead, we studied whether neglecting these correlations could appreciably affect our measurements of the oscillation parameters. The studies are limited to our current experimental exposures and models and would need re-evaluation if applied to any other context.
First, we assessed whether correlations between single systematic parameters in our models could have a substantial impact on our results. For each of , θ and δ, we identified the systematic parameter in each experiment with the largest impact on the measurement of that oscillation parameter. Then, regardless of whether those two systematic parameters made physical sense to correlate, we performed fits to simulated pseudo-data with the parameters fully correlated, uncorrelated and fully anticorrelated. Details of these studies, including how we identified the most impactful parameters, are shown in the Methods. In summary, we saw no case in which the choice of correlation of individual systematic parameters significantly affected the oscillation parameter measurements.
Checking individual parameters does not rule out effects from a mix of systematic parameter variations that combine to produce a net effect that is larger and possibly more degenerate with oscillation effects, representing a potential worst-case scenario for the analyses. Rather than seeking such a set of variations, we directly identified, or in some cases constructed, single systematic parameters for each experiment that have effects similar to each oscillation parameter of interest. We then adjusted the size of the priors on these 'nightmare' parameters such that their impact on the measurements is comparable to that of statistical errors and therefore larger than the net effect of all our regular systematic parameters combined. These nightmare parameters were added to our nominal uncertainty models to create augmented models, allowing us to study a case in which systematic effects are comparable to statistical uncertainty. Next, we constructed simulated pseudo-datasets with the nightmare parameters increased in both experiments by one standard deviation above their prior central values. These simulated pseudo-data were then fit three times using the augmented model: once with the nightmare parameters of the experiments fully correlated (matching the pseudo-data), once fully anticorrelated and finally uncorrelated. We find that the oscillation parameter constraints extracted in the fully correlated and uncorrelated cases have negligible differences. However, the incorrect anticorrelated case yields a large bias. We expect that with even larger systematic uncertainties, differences between the correlated and uncorrelated cases would eventually become relevant. However, this study indicates that we are not in such a regime with the current exposures and systematic uncertainties (see the Methods for further results).
Given that no significant biases are seen from neglecting correlations between actual systematic parameters, and the only bias seen with the nightmare parameters comes not from neglecting a correlation but from adding an incorrect one, we choose in most cases to neglect the correlations between the systematic uncertainties of the two experiments. The one exception relates to the approximately 2% normalization uncertainties on all ν and events described in ref. . In this case, the uncertainties are implemented identically by T2K and NOvA, and we have correlated them.
We also perform studies in which the joint fit is tested against pseudo-data constructed with a set of discrete model variations not directly accessible using the nominal uncertainty models of the experiments. This procedure was used in the earlier independent T2K analysis, and we include in the present analysis those model variations seen as most impactful previously. Similarly, we studied a secondary set of variations based on extrapolating the cross-section model of each experiment to the context of the other experiment. Predefined thresholds were used to establish that no substantive changes in the central values or interval widths of the oscillation parameters were seen under these tests, as described in the Methods. For all tested alternative models, all observed changes in credible intervals were within thresholds (see the Methods for further details). Each experiment continues to investigate improvements in its cross-section models, and the studies described here would warrant repeating for larger data exposures and/or updated theoretical understanding. Continued theoretical and experimental effort in this direction is important.
With the joint likelihood and systematic uncertainty model defined, we use our fitting frameworks to analyse the combined datasets of refs. , finding consistent results between the two frameworks. Unless stated otherwise, we report results using an external constraint on θ (named the 'reactor constraint' below) and external constraints on and θ. The values used for these constraints correspond to the 2020 Particle Data Group summary values and are given in the Methods.
We tested the goodness of fit (Methods) of our model to data using the P-value method, both overall and for each individual sample in the far detectors. All the P-values are within an acceptable range (>0.05 after the look-elsewhere-effect adjustment described in the Methods). The overall P-value to describe all NOvA and T2K samples is 0.75 for full spectral analysis and 0.40 for rate-only analysis, marginalized over both mass orderings. Similar results were obtained without the reactor constraint and in each mass ordering. Thus, the joint oscillation model simultaneously fits T2K and NOvA data well. The P-values are also consistent with those of previous T2K-only and NOvA-only analyses.
We produce parameter estimations using the highest-posterior-density credible intervals and perform discrete hypothesis tests using the Bayes factor formalism. Conclusions related to CP conservation or violation, , and mass ordering have been tested to be robust under the alternative model variations described previously. For the measured oscillation parameters, we report 1σ (68.27%) credible intervals unless noted.
We find without any assumptions on the ordering of the neutrino masses. The fit weakly prefers the upper octant of θ () over the lower octant with a Bayes factor of 3.5. Removing the reactor constraint gives no statistically significant preference for either octant (Bayes factor 1.2 for the lower octant compared with the upper octant). We also find assuming the normal ordering and assuming the inverted ordering. This is at present the smallest experimental uncertainty on (Fig. 2), to our knowledge. This conclusion also applies when the reactor constraint is replaced by a flat prior.
There is no statistically significant preference obtained for either of the mass orderings, with a Bayes factor of 1.3 in favour of the inverted ordering with reactor θ constraint and 2.5 without reactor θ constraint. Although the two experiments individually prefer the normal ordering, the values of other oscillation parameters are more consistent in the inverted ordering, leading to a different ordering preference in the joint fit, although still not statistically significant. The effect on mass ordering preference when additionally incorporating reactor measurements is discussed in the Methods.
With no assumption on the true mass ordering, we find the 1σ credible interval on δ to contain [-0.81π, -0.26π] with the highest posterior probability value being -0.47π. We also find that values of δ around +π/2, an extremum of sin δ, are outside our 3σ (99.73%) credible intervals, which also holds for either mass ordering separately. Figure 3 shows the joint fit result compared with the individual measurements of NOvA and T2K in the plane, as well as one-dimensional (1D) uniformly binned posterior probability distributions for both mass ordering cases. Assuming the normal ordering, the joint analysis allows a wide range of δ values, giving a 3σ credible interval of δ ∈ [-1.38π, 0.30π]. In the case of the inverted ordering δ ∈ [-0.92π, -0.04π], excluding 56% of the parameter space, the CP-conserving values of δ = 0 and π are outside the 3σ credible interval. A consistent picture is seen when analysing the Jarlskog invariant, J (ref. ), which is a parametrization-independent measure of CP violation. The CP-conserving value of J = 0 falls outside the 3σ credible interval for the inverted ordering, and the above statements are true whether the prior used is uniform in δ or sin δ (Fig. 4). This analysis, therefore, provides evidence for CP violation in the lepton sector if the inverted ordering is assumed to be true. However, we do not see a significant preference at present for either mass ordering. Future mass ordering measurements will, therefore, influence the interpretation of these results. See the Methods for more data projections and comparisons.