Overview of model structure and virtual population calibration
The QSP model consists of several nonlinear ordinary differential equations that link receptor binding with BsAbs to form dimers and trimers with tumor cells and relevant MM biomarkers commonly used to assess clinical response. A schematic of the interactions in the three-compartment model is shown in Fig. 1. The three compartments in the model are: central or circulatory, the bone marrow (BM) or the tumor site, and the peripheral compartment (not shown in Fig. 1). In the bone marrow (BM) compartment, which serves as the primary site of action, a BsAb binds CD3 receptors on T cells and BCMA receptors on MM cells to form BsAb-CD3-BCMA trimers, which in turn initiate tumor cell death. BCMA receptors can be shed and released from the tumor cell's surface into the BM and central compartment, thereafter referred to as soluble BCMA (sBCMA), in agreement with prior in vivo and clinical studies. We assume that sBCMA can bind to one arm of the BsAb in the central and BM compartments, forming nonfunctional dimers or trimers, which may ultimately inhibit antitumor efficacy due to a potential drug-sink effect (not shown in the model schema). However, when a trimer complex is successfully formed with tumor-bound BCMA, the T cells become activated and trigger cytokine release, pushing the system to a pro-inflammatory state that alters the passage of T cells between compartments via IL-6. Lastly, MM cells shed M-protein and FLC paraproteins into the circulating system. These paraproteins were added to the model since they are used for monitoring therapy response in the clinic. Although not shown in the model schema, the BsAb also binds to BCMA, sBCMA, or CD3 to form dimer complexes. More details of the QSP model, including model equations and descriptions of parameters, can be found in the Supplementary Materials.
We use a virtual population simulation framework with the QSP model to facilitate model calibration to clinical data and assess model prediction uncertainty. At a high level, a virtual population (VPop) in this work corresponds to a multistep algorithm for sampling and selecting model parameter sets that yield reasonable model outputs under different treatment regimens. The model was initially simulated with 10,000 random parameter sets and then passed through two filtration/optimization steps that used a genetic algorithm (see Supplementary Fig. 3) to facilitate: i) selection of plausible patients corresponding to QSP model parametrizations that, upon simulation, produce acceptable ranges of untreated tumor M-protein dynamics and ii) selection of virtual patients through an optimization algorithm that selects subsets of plausible patients probabilistically to match treated tumor-associated paraprotein dynamics obtained from elranatamab studies (see Supplementary Table 4). Thus, each VPop consists of approximately 120 different parametrizations of the QSP model such that their summary statistics (described below) match those observed in MagnetisMM-1 and MagnetisMM-3 subcohorts when simulated under the same schedules administered in the trial. Each of these parametrizations, referred to as a virtual patient, represents a realistic model output.
A subset of nine model parameters was selected to be varied for plausible and virtual populations by using local sensitivity analysis and evidence from the literature about anticipated sources of biological heterogeneity (see Supplementary Table 1 and Supplementary Fig. 1). The parameters identified from the local sensitivity analysis were mostly associated with trimer formation (i.e., receptor densities), tumor killing (i.e., n, α), and drug resistance (α). These nine parameters were sampled from uniform distributions to build 10,000 parametrizations. Next plausible patients were selected from the parametrizations based on evaluation of model-simulated untreated serum M-protein doubling times and cross-checking that they were within clinically sensible published ranges (see Supplementary Fig. 2 for parameter distributions). A schema of the model calibration workflow is shown in Supplementary Fig. 3a. The objective function compared summary statistics of interim VPops to observed summary statistics in clinical data and selected the virtual cohort with the closest fit to clinical data. Specifically, the objective function included three efficacy metrics of serum paraprotein response (see Supplementary Figs. 3b, 4 for details): (1) serum integrated paraprotein dynamics, (2) best overall response (BOR) based on serum biochemical response, and (3) biochemical response rate (BRR) stratified by BL sBCMA levels. The integrated paraprotein was defined as either M-protein or FLC, selected per patient by observation of the serum paraproteins at BL, prioritizing M-protein if it was measurable ( ≥ 0.5 g/dL) and otherwise using FLC as a biomarker of response. A set of 120 virtual patients (corresponding to one VPop) was selected from the plausible patients. More details on the estimation of these metrics are found in the Methods section. The distribution of perturbed parameters from plausible patients to virtual patients remained mostly unchanged, indicating that parameter filtration was not biased (see Supplementary Fig. 2). The BL characteristics of the resulting VPops covered the ranges observed in clinical data, specifically for M-protein, FLC, and sBCMA measurements as shown in Fig. 2b.
Higher levels of sBCMA have been associated with poorer patient outcomes, potentially due to a reduction in target availability for BsAb binding or being a marker of higher disease burden. Across the doses administered in the MagnetisMM-1 and MagnetisMM-3 studies, objective responses were found to be lower among patients with high sBCMA levels ( ≥ 100 ng/mL) compared with patients with low sBCMA levels ( < 100 ng/mL) at BL (see Fig. 2a). We used 100 ng/mL as the threshold since it was identified as the most significant cutoff that predicted objective response (see Methods for more details). We adjusted for this confounder by calibrating the model to two distinct dose-response curves dependent on BL sBCMA levels.
To facilitate model output calibration to data, we defined a biochemical response as a decrease of ≥50% in serum paraprotein levels that was persistent (i.e., response is maintained for two consecutive observed or simulated tumor assessments). The BRR was estimated from data for some cohorts in MagnetisMM-1 (parts 1 and 2 A) and MagnetisMM-3 (Cohort A) studies, which included different administered doses, stratified by sBCMA levels at BL. Our virtual trial simulations built a cohort of 120 virtual patients that closely matched the estimated BRR of a study cohort when simulated under similar conditions. Simultaneously, within each cohort, median paraprotein dynamics were calibrated to median observed percent change from BL, and the information from each study was weighted by trial size. Fig. 2c illustrates good agreement of median paraprotein dynamics between a fitted VPop and patients from MagnetisMM-3 Cohort A (patients with no prior exposure to BCMA-targeted therapy), and Supplementary Fig. 9 shows that the same VPop appropriately fits a smaller cohort from MM-1. Lastly, BOR from virtual patients was matched to observed BOR, as shown in Fig. 2d, to build a VPop that matched the efficacy summary metrics, such as BRR and median change in paraprotein levels, in the higher-dose cohorts. To fit the VPop's BOR to the observed BOR, we ranked BORs by decreasing order, split them into 10 groups, and estimated the median of each group. Then, each 10th percentile of a VPop was matched to the corresponding 10th percentile in the data (see Supplementary Fig. 6). We also explored dynamics of other variables that were not used for model calibration but were measurable in the clinic. Among these variables were the concentrations of the therapeutic BsAb and sBCMA. In the clinic, these values are measured using two different assays that report a total concentration of analyte (unbound and bound analyte) or concentration of the analyte in free, or unbound form.
After calibrating 10 VPops (each with 120 virtual patients), we simulated a dose-escalation virtual trial using doses ranging from 16 mg to 152 mg once weekly (QW). These simulations help strengthen our understanding of the dose-response relationship for the MagnetisMM-1 study as the clinical dose-escalation cohorts included small numbers of patients with varying BL sBCMA levels, making it difficult to discern the role of sBCMA on efficacy for different dose levels. The lowest dose, 16 mg QW, corresponds to the fixed-dose equivalent of the lowest efficacious dose (215 μg/kg) observed in MagnetisMM-1. The virtual patient BL sBCMA levels were calibrated to the available clinical data, such that the proportion of patients with low or high sBCMA levels at BL was comparable to the proportion of patients with the corresponding sBCMA levels in the MagnetisMM-1 and MagnetisMM-3 Cohort A studies. The simulated doses with a weekly dosing frequency were 16 mg, 28 mg, 44 mg, 76 mg, and 152 mg. The weekly dose of 76 mg, or 1000 μg/kg, was proposed as the recommended phase 2 dose (RP2D), and a dose of 1000 μg/kg every 2 weeks (Q2W) achieved a wide PK exposure between that observed for 360 μg/kg QW and 1000 μg/kg QW, but the efficacy of Q2W dose frequency from the start of treatment remained a question of interest. To address this, we simulated an additional schedule of 76 mg Q2W to explore how it compared with 44 mg or 76 mg QW. Lastly, we explored the regimen of 152 mg QW to evaluate the added benefit of a dose higher than the RP2D.
For each dose regimen and BL sBCMA subgroup, we calculated the BRR to compare simulated efficacy from the QSP model across evaluated regimens, shown in Fig. 3a. Virtual patients with low sBCMA levels at BL (70% of VPop) showed higher BRR than patients with high sBCMA levels at BL (30% of VPop) for regimens from 16 mg QW to 76 mg QW. Based on these simulations, the 76 mg QW showed the highest efficacy for patients with low sBCMA levels, corresponding to a BRR of 80% in our simulations, which was similar to the BRR observed at 44 mg QW and 76 mg Q2W (~80%) and higher than that at 152 mg QW (72%). For patients with high sBCMA levels, efficacy peaked at 152 mg QW with a BRR of 53%, compared to the 76 mg QW regimen that resulted in a BRR of 38%; both regimens resulted in higher BRRs than lower doses of 44 mg QW and 76 mg Q2W (~25%) (see Fig. 3a). Figure 3b shows the total BRR of the same schedules across all virtual patients (i.e., pooled high and low sBCMA levels). The regimen of 76 mg QW resulted in the highest BRR across all virtual patients (68%) (see Fig. 3b). Considering these simulations, clinical evaluation of the 152 mg QW regimen was deemed unnecessary as it is unlikely to provide additional clinical benefit vs the 76 mg QW regimen for the overall population. The subset of virtual patients with high BL sBCMA levels still achieves clinically meaningful benefit with elranatamab 76 mg QW.
We developed a metric to enable assessment of drug efficacy in our model corresponding to the "effective binding ratio," which is defined as the number of trimers divided by the number of all BCMA receptors (see Equation 39 in Supplementary Materials). Our computed metric is similar to an effective receptor occupancy metric used in Abrams et al. This computed effective binding ratio measures antitumor efficacy at distinct doses for each virtual patient. A high effective binding ratio might indicate optimal antitumor efficacy in alignment with a bell-shaped dose-response curve for BsAbs. More in general, a bell-shaped concentration-response relationship is a well-described phenomenon for ternary complexes. A low effective binding ratio suggests two scenarios: (1) a small amount of drug available that does not build enough trimers for antitumor efficacy or (2) a large amount of drug has saturated the majority of cell receptors (BCMA and CD3), so there is an excess of dimers, which hinders the formation of trimers. We simulated five different doses that were administered weekly for each virtual patient, and we calculated average drug concentration and average effective binding ratio in the first three cycles of therapy. Fig. 4a shows these two metrics for four selected virtual patients with different concentration-response curves. The first and third panels show virtual patients 9 and 14, whose optimal efficacy binding ratios are achieved at drug concentrations corresponding to 76 mg QW, indicating no added benefit from higher doses. In contrast, virtual patient 13, who responded biochemically to 16 mg QW, shows a dose-dependent increase in trimer to BCMA ratio, suggesting potential for deeper response at higher doses. Virtual patient 22, despite having <100 ng/mL BL sBCMA, did not respond at any dose due to limited CD3 receptors, placing them on the declining side of the bell-shaped curve. Overall, while optimal binding ratios vary by individual, simulations support 76 mg QW as the lowest clinical dose providing broad coverage across the virtual population (see Fig. 3b).
To further investigate the determinants of response across doses, we calculated and analyzed the binding ratio versus average BsAb concentration for all virtual patients and catalogued them as illustrated in Fig. 4a and Supplementary Fig. 10b. Based on the shape of the curves, we identified three distinct response patterns: increasing, bell-shaped, and decreasing (Supplementary Fig. 10a). Virtual patients with increasing dose-binding ratio curves are predicted to benefit from higher dosing or drug exposure, whereas those with bell-shaped or decreasing curves may experience reduced efficacy at higher doses (see Supplementary Fig. 10c for BRR trends across groups).
To explore the underlying drivers of these patterns, we stratified virtual patients by dose-binding ratio-curve shape and examined the distributions of model parameters and initial state variables within each group. Using Kruskal-Wallis tests, we identified the most predictive features to be free BL sBCMA and T cell concentration in the bone marrow (Fig. 4b, c). When these variables were assessed jointly, their association with dose-binding ratio shape remained statistically significant (likelihood ratio test, P = 0.0023), and, as expected, they were not correlated (Supplementary Fig. 11a). Additional variables evaluated in this analysis are shown in Supplementary Fig. 11b.
Besides the total amount of antibody impacting trimers, the dosing frequency can also affect trimer and dimer formation in non-intuitive ways. To this end, we used the model to evaluate the maintenance of response in scenarios when the dose regimen is reduced from QW to Q2W to Q4W after initial response (i.e., from once a week to every other week to once a month dosing regimens). This dose reduction has now been implemented in the clinic for persistent responders, or patients who show a confirmed objective response after six cycles of QW therapy. To facilitate the evaluation of maintenance of response, we computed a simulated progressive disease (PD) event that was derived from International Myeloma Working Group (IMWG) progression criteria. To translate our findings into clinical meaningful outcomes, we defined simulated PD among persistent responders as a 25% increase from the lowest value of an integrated biomarker of a tumor assessment. We set our analysis to only virtual persistent responders because they were candidates for switching to Q2W starting on treatment cycle 7 (C7, each cycle consists 28 days of therapy), so we were able to track their tumor dynamics under a scenario in which they did not switch (continued receiving 76 mg QW after response was confirmed), and one in which they did switch to Q2W (switch to 76 mg Q2W starting at cycle 7 after response was confirmed). Virtual patients that are persistent responders may switch to Q2W at different times starting at cycle 7, since the dose reduction depends on the individual patient's response. Most virtual patients achieved a confirmed response by cycle 7 while receiving 76 mg QW, with an initial step-up priming regimen, with the median of virtual patients showing almost a clearance in integrated paraproteins by day 100 of therapy, as shown by the red solid line in Fig. 2c. Thus, most virtual patients that were persistent responders transitioned from QW to Q2W starting at cycle 7, as consistent with observed data in MagnetisMM-3. We observed that there was a greater tumor shrinkage in 85.2% of virtual patients with persistent response after a dose reduction (QW to Q2W) compared with a regimen constant dose at QW by cycle 13, as shown in Fig. 5a. A similar analysis showed that in a Q2W to Q4W transition starting at cycle 13, 87.2% of persistent responders remaining in the trial would have greater tumor reduction with Q4W dosing compared with Q2W dosing, also shown in Fig. 5a, b shows that only 17.4% of persistent responders who switched to Q2W showed PD between the time that they switched until day 1095 (3 years) of treatment. The scenario with dose reduction to Q4W after cycle 12 showed a similar result. Under the scenario of no switching, 22.8% of persistent responders showed PD over a period of 3 years of treatment. These simulations suggest that less drug among persistent responders is not expected to be detrimental to maintenance of response and can even result in a longer or better maintenance of response. Our model showed that the average number of trimers to tumor cells significantly increased after the point of dose reduction until 18 cycles of therapy (see Fig. 5c), supporting the evidence of more trimer formation following a dose reduction. The two scenarios with dose reductions showed similar trimer:tumor cell ratios at 18 cycles of therapy, but differences became significant following a total of 36 cycles of therapy (P < 0.001 by Wilcoxon test). However, the differences between the QW/Q2W and QW/Q2W/Q4W regimens were considered minimal due to the equal simulated PD.
We took a closer look at the time dynamics of drug trimers to further investigate why less frequent dosing could help increase efficacy (see Supplement). Our model suggests that a switch from QW to Q2W will decrease the ratio of drug-CD3 dimers to T-cell concentrations, thereby increasing availability of CD3 receptors, as shown in Supplementary Fig. 5a. In Supplementary Fig. 5b, we show an increase in the trimer:tumor cell ratio after the Q2W switch, suggesting that trimer numbers were increasing under Q2W compared with QW dosing, thus increasing tumor-killing activity. There was little difference in the ratios of BsAb-BCMA dimer:tumor cell between the dosing schedules, as shown in Supplementary Fig. 5c, suggesting that most BCMA receptors were engaged with BsAb in a dimer complex. Thus, under a Q2W schedule when tumor burden is sufficiently low, BCMA-BsAb dimers are more likely to find an available CD3 receptor to bind and form a trimer, leading to better maintenance of response.
Next we investigated whether a more stringent criterion of a deeper simulated clinical response to transition to Q2W would influence the maintenance of efficacy, thus effectively testing the robustness of the model prediction with respect to clinical switching criteria. As outlined above, a dose reduction from QW to Q2W starting on C7 happens once a patient reaches partial response (PR) or better (see Methods for criteria and calculation of response). To investigate an alternative threshold for switching from QW to Q2W dosing, we tested another criterion of a very good partial response (VGPR) or better (i.e., ≥90% decrease in paraprotein levels from BL). We compared this scenario requiring VGPR or better to switch with the one that requires PR or better after cycle 6. In both scenarios, a two-step-up priming regimen was simulated (i.e., 12 mg on cycle 1 day 1, 32 mg on cycle 1 day 4, and 76 mg on cycle 1 day 8 with QW thereafter until eligible for transition to Q2W). As expected, the BRR did not change between the two scenarios since a biochemical response in our algorithm is defined as PR or better, which is reached before a VGPR level is reached (see Fig. 6a). Maintenance of response analysis, in Fig. 6b, shows that transitioning to Q2W following a PR or better results in fewer PD events than transitioning to Q2W following a VGPR or better criterion (17.4% vs 19.6% PD among responders, respectively). In other words, the QW to Q2W transition criterion of achieving VGPR or better had 17 additional PD events (2.18% of persistent responders) compared with the reference (recommended) scenario. This finding suggests that earlier dose reductions might be better, contrary to the perception of waiting for a deeper response to reduce the dose. Despite seeing a better maintenance of response when transitioning to Q2W after PR or better, there was no statistically significant difference in the number of trimers per tumor cell between scenarios, suggesting very little difference in tumor-killing optimization between schedules across all virtual patients (see Fig. 6c).
To further evaluate the robustness of our VPop calibration framework, we explored various simulated outputs not used during model calibration and compared them with observed data. First, we compared free sBCMA and drug concentrations with the simulated values in the central compartment of one VPop, as shown in Supplementary Fig. 7. The 95% CIs of the free sBCMA simulations captured the observed median of the MagnetisMM-3 study, while the free drug concentration was initially overestimated and then underestimated after day 50 of treatment. To examine whether our model accurately described IL-6 trends, we used the cohort arm that received the RP2D in the MagnetisMM-1 study to tune model IL-6 peaks. This cohort followed a one-step priming regimen: 44 mg on cycle 1 day 1 and 76 mg on cycle 1 day 8 and weekly thereafter. We aimed to have median projections match median observations. Supplementary Fig. 8 shows the model predictions but following a two-step priming regimen: 12 mg cycle 1 day 1, 32 mg cycle 1 day 4, 76 mg cycle 1 day 8, and weekly thereafter. Compared with the observed IL-6 levels from the MagnetisMM-3 Cohort A study, the median of the simulated IL-6 peaks matched the data median, especially during the priming doses, but variability of observed IL-6 levels was underpredicted.
To validate our model, we tested the predictions against another arm within the MagnetisMM-1 study. In the MagnetisMM-1 Q2W priming arm, participants with RRMM (n = 13) received 44 mg on cycle 1 day 1 and 76 mg on cycle 1 day 8 and every 2 weeks thereafter. The enrollment criteria for this arm were the same as those from the MagnetisMM-1 QW and MagnetisMM-3 Cohort A studies. We simulated this dosing regimen using one of the ten fitted VPops that was chosen randomly. Fig. 6d shows the paraprotein dynamics of MagnetisMM-1 Q2W participants compared with the simulated dynamics of a VPop, stratified by biochemical response. The 95% prediction interval of our model covers most of the observed percent changes from BL. Our model accurately predicted the year long-term dynamics for patients who responded to treatment under a reduced dosing interval, and it was able to predict the quick progression of disease as measured by the increase of paraprotein levels among those that did not reach an objective biochemical response.