Progesterone to Prevent Recurrent Preterm Delivery – How Much Do We Know? (Part 3)
Scroll down to the bottom of this post for the essentials.
At last we come to a trial which was published within the last year (October 2019), in the American Journal of Perinatology. The present trial, which I’ll refer to as the “confirmatory trial,” was meant to confirm the results of the Meis trial (see my Part 2 post). After the Meis trial was published, the manufacturer of 17 alpha-hydroxyprogesterone caproate (17-OHPC) applied to the FDA for approval to market it as an agent to prevent preterm delivery. After some years of discussion, in 2011 the FDA granted accelerated approval for this purpose. Since the manufacturer had relied primarily on the Meis trial to make the case in its new drug application (NDA), the FDA conditioned this approval on the conduct of a confirmatory trial. They further required that the confirmatory trial enroll at least 10% of its participants from the United States.
An interesting aspect of the confirmatory trial was that its methods mirrored those of the Meis trial, with just a few exceptions. Eligible women were ≥18 years old, had a history of previous pregnancy that ended in spontaneous preterm delivery due to preterm labor or preterm rupture of membranes, current singleton pregnancy that was between 16 weeks 0 days and 20 weeks 6 days of gestation. (In the Meis trial the lower limit for gestational age was 15 weeks.) A planned cervical cerclage (procedure involving stitches or tape to try to keep the cervix closed), a current fetus with a known congenital anomaly, and some other criteria similar to the Meis trial would exclude the woman.
Unlike the confirmatory trial, the Meis trial didn’t state a lower age for inclusion. They may have had one, but I note women in the Meis trial were on average 26 years old, while in the confirmatory trial they had a mean age of 30. The standard deviation for age in both trials was a about 5.5 years (Meis trial) and 5.2 (confirmatory trial). A 95% confidence interval, which tells us the range of values that represents about 95% of the participants, uses two standard deviations. This suggests that about 95% of the women in the Meis trial were between 15 and 37 years old, while in the confirmatory trial they were about 20 to 40 years old. This isn’t a given, but a rough estimate.
Participants followed the same weekly schedule of intramuscular injections, either 17-OHPC or placebo, as in the Meis trial. The randomization was done differently in this trial. First, it was stratified by enrolling site. This means each site had their own randomization list. This is commonly done in multicenter trials. The reason for this is that each site will have certain unique ways in which they care for patients. With simple randomization, it’s possible that a particular site might contribute more patients to one study group than the other. Suppose this one site doesn’t follow a best practice of some sort, and the outcomes of its patients aren’t as good as at other sites. That will make the outcomes of the study group that got more patients from that site worse because of that site’s practices – and not because of the intervention being studied in that group. So by developing a separate randomization list for each site, we reduce the chance of that happening.
But wait! Each site might enroll a fairly small number of patients – in fact, this is the commonly the case. A simple randomization procedure for that site might still produce groups with more patients in one study group and fewer in the other than intended. So another technique is used to guarantee that won’t happen. This is called block randomization. A block means a small group of patients, e.g., two, four, six, etc. If there are two study groups it will always be an even number. If three groups, the block will use a multiple of three.
Suppose you choose a block size of four patients, and your trial has two study groups, A and B. There are a limited number of ways of ordering group A and group B for four patients: AABB, ABAB, BABA, BBAA, ABBA, BAAB. Each one of those sequences is a block. If your trial requires 48 patients, the statistician creates the treatment assignment list by randomly selecting a block from the preceding six sequences 12 times. Assuming you enroll all 48 patients, this will force the trial to randomly assign 50% of the patients to study group A and 50% to study group B.
As a side note, if the statistician only uses one block size, let’s say four, then the treatment assignment for every 4th patient enrolled in the trial will be known. That is, if a clever clinician notes that the first three patients went A – B – A, then they know the next patient will go into the B group. The allocation will be revealed for 25% of the patients. Statisticians get around this problem by choosing random block sizes when they create the list. Maybe they will have block sizes of four, six and eight. The clinician won’t know what the block size is, and so won’t know when the “last” treatment assignment in that block is coming up.
So the confirmatory trial used what is called stratified, blocked randomization. This is known as a “fixed” randomization scheme because the treatment assignments are laid out in advance, unlike the adaptive randomization method used in the Meis trial. Both are perfectly good ways to randomize in a large, multicenter trial.
Like the Meis trial, with the exception of the weekly intramuscular injections, the participants’ medical care was determined by their physician. Neither the patient, nor their physician or other caregivers or study personnel knew which agent they were getting (i.e., 17-OHPC or placebo) (double-blinding). Also similar to the Meis trial, compliance of the women in getting their weekly shots was high (91.4% for 17-OHPC and 92.4% for placebo).
The primary outcomes were preterm delivery at <35 weeks of gestation and a composite neonatal morbidity and mortality outcome. “Composite” means that several different outcomes were combined into one summary “yes/no” outcome, in this case neonatal death (death <28 days after birth), grade 3 or 4 intraventricular hemorrhage, respiratory distress syndrome, bronchopulmonary dysplasia, necrotizing enterocolitis, or sepsis. The reason for combining these into one “yes/no” outcome is to increase the statistical power of the study for a given number of patients enrolled. Each of these outcomes is rare and would require many thousands of women to be enrolled in order to have a decent chance of finding a statistically significant difference between study groups.
So how did they decide the required number of women to enroll in the confirmatory trial (sample size)? They used the results from the Meis trial, i.e., the proportion of women in the Meis trial who delivered at <35 weeks, and the proportion of infants who had at least one of the conditions in the neonatal index. The downside to this is that the Meis trial enrolled a group of women whose outcomes were much worse than expected based on the inclusion/exclusion criteria and what the Meis investigators had seen in previous trials. That is, these were not necessarily realistic assumptions. The “upside” is that it likely required fewer women to enroll because the Meis’ trial outcomes were more common. Their calculations determined that they needed 1,707 women for the confirmatory study.
The trial randomized 1,708 women. As with any trial, they screened more women than that for eligibility, but the authors couldn’t say how many, stating that, “Due to the nature of chart screening and eligibility assessment, the specific number of women who were evaluated for potential eligibility and/or declined participation was not tracked.” Huh? With any sponsored trial, it’s standard to keep a screening log that contains the name of each patient screened – at least approached and/or tested beyond chart review, whether they were eligible for the trial, and if not, why not. This is part of the “Good Clinical Practices” that the investigators said they followed. (“Good Clinical Practices” concerns best practices for conducting clinical trials and should really be called “Good Clinical Trial Practices.”) One purpose of screening logs is to assess the potential for selection bias in terms of who eventually ends up participating in the trial. Patients who are eligible but don’t enroll may be different in terms of important risk factors or other characteristics than those who don’t in ways that may affect the applicability of the trial’s results to those non-enrolled patients.
While the Meis trial had been conducted exclusively at academic medical centers throughout the United States, the confirmatory trial was conducted at sites both inside and outside the U.S, and not necessarily at academic medical centers. Despite their very similar inclusion and exclusion criteria, the two trials did not enroll similar patient populations (see table below).
Characteristic | Meis 17-OHPC | Meis Placebo | Confirmatory 17-OHPC | Confirmatory Placebo |
Age | 26 +/- 5.6 | 26.5 +/- 5.4 | 30 +/- 5.2 | 29.9 +/- 5.2 |
Race/ethnicity White Black Latino Asian | 79 (25.5%) 183 (59%) 43 (13.9%) 2 (0.6%) | 34 (22.2%) 90 (58.8%) 26 (17.0%) 1 (0.7%) | 1,004 (88.8%) 73 (6.5%) 101 (8.9%) 23 (2.0%) | 504 (87.2%) 41 (7.1%) 54 (9.3%) 22 (3.8%) |
Gestational age at time of previous preterm delivery | 30.6 ± 4.6 | 31.3 ± 4.2 | 32 (28-35) | 33 (29-35) |
>1 prior preterm delivery | 86 (27.7%) | 63 (41.2%) | 148 (13.1%) | 70 (12.1%) |
Women in the confirmatory trial were older, more likely to be white, a little bit farther along in their prior pregnancies when the preterm delivery happened, and less likely to have had more than one previous preterm delivery. It’s fascinating how the same eligibility criteria could produce such difference groups of trial participants.
What did the confirmatory trial find? No difference in outcomes between patients who received 17-OHPC and those who received placebo. Preterm delivery at <35 weeks in the 17-OHPC group was 11.0% and 11.5% in the placebo group (20.6% vs. 30.7% in the Meis trial, respectively). The frequency of the composite neonatal index was 5.6% in the 17-OHPC group and 5.0% in the placebo group. The only outcome out of the 35 they analyzed that was statistically significant was miscarriage (pregnancy loss prior to 20 weeks gestation), favoring 17-OHPC. However, the stillbirth rate was non-significantly worse for 17-OHPC and one significant difference out of 35 could very well be due to chance alone (type I error).
So, the investigators were left trying to explain what happened. The first argument was that the trial was “underpowered” given the low event rates in the confirmatory trial. According to the investigators, if they had enrolled more patients, they would have shown that 17-OHPC is effective. Well, maybe. They would have needed about 2,150 patients to have 80% power to find a difference of 11% vs. 7.5% in the rate of preterm delivery at <35 weeks. They may or may not have ended up finding such a difference. But the larger point is that they weren’t looking to show that kind of a difference in the first place. They were wanting to target a higher risk group of pregnant women, but they didn’t enroll that population.
The next argument was that, since 17-OHPC had already been recommended to prevent preterm delivery by the American College of Obstetricians and Gynecologists, many centers did not want to participate in the confirmatory trial. It’s not stated but I guess the idea is they didn’t want to enroll their high-risk patients in the trial. According to the investigators, the centers who participated (inside and outside the U.S.) did not have access to 17-OHPC and were amenable to enrolling patients in the confirmatory trial. If this is the case, wouldn’t they want to enroll their highest risk patients, who would be most at need of an effective intervention? Or did they send those highest risk patients to the major medical centers?
The investigators pointed out that the women in the confirmatory trial were less likely to have had >1 previous preterm delivery and so were not as high risk. Okay, but in the Meis trial when they looked at the effects of 17-OHPC by number of previous preterm deliveries, 17-OHPC was superior in all categories. So that doesn’t seem to explain much either. They also discussed how few women in the confirmatory trial had a short cervix (this wasn’t reported in the Meis trial). However, their exclusion criteria would have eliminated a lot of women with a short cervix (i.e., planned cervical cerclage) so apparently they weren’t looking to apply the results to that population.
The next argument was that the confirmatory trial had a lot of women from non-United States populations. However, the rate of preterm delivery at <35 weeks among U.S. patients was 15.6% for 17-OHPC and 17.6% for placebo – not a very impressive difference and not significant at p=0.6633.
You can tell the investigators really wanted to show that 17-OHPC is effective for preventing preterm delivery because they repeatedly discussed how the trial was “underpowered” and inconclusive (thus we should pay more attention to the Meis trial, which had “robust” findings).
Certainly an effective intervention is needed for pregnant women who have previously had a preterm delivery. It’s hard to say if progesterone (17-OHPC) is beneficial for some subgroup of this population. Preterm delivery itself is known to have multiple causal pathways. Given the heterogeneous populations enrolled in the Meis and confirmatory trials, I suggest more work needs to be done on figuring out which patients, among the general category of “history of preterm delivery” might benefit from progesterone during a subsequent pregnancy, before another trial is launched.
Okay, that’s enough about progesterone for a while.
The Essentials
Concept or Issue | Description | Why It’s Important |
Stratified randomization | A separate randomization sequence (allocation, list) is generated for each subgroup of participants to be enrolled in a clinical trial. | Since it’s a change procedure, simple randomization may still produce study groups that are not balanced or equivalent with respect to a key factor that affects the outcome. When this is the case, it may be very difficult to determine why the outcomes of the study groups are different – it might be because of the intervention or because of the underlying risk factor. Stratification of the randomization by an important risk factor ensures that that factor will not be a reason study groups have different outcomes. |
Blocked randomization | Patients to be enrolled in a trial are taken a few at a time, depending on the number of study groups – this is called a “block.” Within this small number (typically <10), all possible orderings of the study groups are spelled out. (e.g., for a block size of four it would be AABB, BBAA, ABAB, BABA, ABBA, BAAB). Blocks are randomly selected from these orderings to build the randomization list until all patients in a study group have an assignment. | As above, we cannot count on simple randomization to produce study groups with equal numbers of patients in them. This reduces the statistical power of the trial. Block randomization guarantees that the intended percentage (e.g., 50% experimental and 50% control) of patients in each study group will occur. To avoid revealing a subset of the randomization list, usually blocks of different sizes are created and selected for building the randomization list. Another benefit is that, if inclusion or exclusion criteria are changed during the trial, the study groups will be evenly balanced throughout the trial. |
Composite outcome | When outcomes are of interest but rare, they are combined into one summary “yes/no” category to create an outcome that is more frequent. | The advantage to investigators conducting a randomized trial is that the composite outcome will require fewer patients for the same statistical power to find a difference between study groups. Caveat: many times the composite outcome’s results are heavily influenced by one of the components that is more common than the others. For example, a composite outcome with subsequent percutaneous coronary intervention (PCI), myocardial infarction, stroke, death. The PCI will be much more common than the other outcomes. |
Good Clinical Practices | Good Clinical Practices is a set of guidelines produced by the International Conference on Harmonisation for the conduct of clinical trials. They cover design, ethics, conduct, monitoring and reporting of trials. | Consistent standards for the conduct of clinical trials should result in reliable and high quality evidence produced by them. It should also promote patient safety. FDA requirements for conducting clinical trials aligns with Good Clinical Practices. |
References
Blackwell SC, Gyamfi-BannermanC, Biggio JR, Chauhan SP, Hughes BL, Louis JM, et al. 17-OHPC to Prevent Recurrent Preterm Birth in Singleton Gestations (PROLONG Study): A Multicenter, International, Randomized Double-Blind Trial. Am J Perinatal 2020 Jan;37(2):127-136.