Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
Downloaded by [Constance Mara] at 13 June Equivalence tests are available for several research designs, however, paired- samples equivalence tests that are accessible and relevant to the research performed by psychologists have been understudied.
This study evaluated parametric and nonparametric two one-sided paired-samples equivalence tests and a standardized paired-samples equivalence test developed by Wellek Keywords Test of equivalence; Paired-samples. Equivalence Testing Psychologists often investigate differences between the means of two or more conditions or groups on some outcome variable. Traditional tests of differences e.
However, when the research is investigating the equivalence of group means, researchers still commonly employ the use of traditional difference-based tests, using non-rejection of the null hypothesis as grounds to conclude equivalence.
One problem with employing a traditional difference-based test for assessing equivalence is that the probability of rejecting the null hypothesis that the population means are equal increases as sample size increases. Further, when using traditional difference-based tests, equivalence will usually be found when studies are under-powered.
However, this recommendation has not been widely adopted as common practice by researchers in psychology as discussed later. Tests of equivalence have been used in biopharmaceutical studies for several decades in order to assess the equivalence of different medications Seaman and Serlin, For example, a new drug might be less expensive than a currently recommended drug, but in order to recommend the use of the new drug, its effects must be equivalent to the older, reliably used drug.
More recently, tests of equivalence Downloaded by [Constance Mara] at 13 June have been introduced into psychological research, as their potential relevance within behavioural research has been recognized Cribbie et al. Researchers would use tests of equivalence, as opposed to the traditional difference-based tests, to determine if the population mean difference between two or more groups or conditions is small enough to be considered inconsequential.
In traditional difference-based tests, the null hypothesis states as mentioned previously that the difference between the group or condition population means is equal to zero. In a test of equivalence, the null and alternative hypotheses are essentially the reverse of the hypotheses in the traditional difference-based tests. For tests of equivalence, the null hypothesis states that the difference between the group or condition population means falls outside a determined equivalence interval i.
It is important to point out that the equivalence interval does not need to be symmetric i. The equivalence interval is set by the researcher and represents the maximum difference between the population means that would be considered inconsequential in terms of the research conducted. The alternate hypothesis for an equivalence test states that the difference between the population means falls within the equivalence interval i. The null hypothesis is laid out as two hypotheses that must both be rejected in order to declare equivalence of the means.
It is important to note again that both of the null hypotheses must be rejected in order to declare the means equivalent. Using a traditional difference-based t-test when addressing questions of equivalence will often result in faulty conclusions Cribbie et al.
If one has a small sample size and uses a t-test to declare equivalence, too often the groups will be declared equivalent when they are not equivalent.
For example, an equivalence interval of one standard deviation might be inconsequential in one study, but might be a meaningful difference i. For example, Norlander et al. They measured numerous personality traits at pretest, administered an intensive training over the course of a year designed to alter personality characteristics e.
It was found that several personality traits were equivalent at pretest and posttest i. However, in order to assert this conclusion, the researchers would be more accurate to use a paired-samples test of equivalence. In another example, Greig et al. These researchers compared baseline to posttest on these measures.
They used a paired-samples t-test to establish no change from baseline to posttest. Further, many of the tests that currently exist are not easily adoptable by psychological researchers. It is also important to highlight that a paired-samples test of equivalence should take into account that observations across conditions are correlated.
For example, the traditional difference-based paired-samples t-test assumes that observations are correlated and removes variability due to inter-subject differences from the error term. Thus, the paired-samples t-test is more powerful than the independent samples t-test when observations are correlated or non-independent see Zimmerman, , for a discussion. Consequently, a paired-samples test of equivalence should also take into account the non-independence of the observations in order to have a more powerful test of equivalence.
Psychological researchers are typically interested in mean differences or equivalence, and thus a test invoking the use of ratios is usually not practical in behavioural research. Again, these methods are often not relevant for use in behavioural research. Further, recent articless discussing more powerful methods for conducting tests of equivalence e. Wellek developed a test of equivalence that assesses the mean of the difference scores for paired observations, which is more relevant to the work behavioural scientists perform.
Although the Wellek test is designed to evaluate hypotheses framed in standardized units, as one of the only paired samples tests of equivalence available, it is conceivable that researchers would also utilize this test for hypotheses relating to raw mean differences by simply making an estimate of the population standard deviation of the differences. Therefore, although psychological researchers rarely have info about the population standard deviation of the differences, we felt it was important to evaluate this procedure in situations in which researchers would make an estimate of the population standard deviation.
Downloaded by [Constance Mara] at 13 June 1. The two one-sided tests procedure for paired-samples TOST-P frames the hypotheses in terms of raw mean differences, not standardized mean differences. Consequently, the alternate hypothesis states that the mean difference score is small enough to fall within the determined equivalence interval, and the population means are thus equivalent i.
In order to be able to evaluate the properties of the Wellek, TOST-P, and NPAR procedures, the next section of the article will utilize a simulation study to evaluate how each test performs under data conditions thought be common in psychological studies.
Instead, we will demonstrate the properties of this test when used to evaluate questions of equivalence as discussed previously. Several variables were manipulated in this study, including the correlation between paired observations, mean differences, distribution shapes, and the relationship between the true and the estimated population variance see Table 1. The difference between the means was varied in order to examine power and Type I error control.
The sets of means used in this study can be found in Table 1. For example, simulations were conducted with the estimated population variance and the true population variance both set to 1 i. The correlational structure between paired observations was also manipulated in order to determine what effects, if any, different magnitudes of correlation would have on the tests. In particular, we ran simulations with the correlation between observations set at. The above conditions were investigated when the underlying distributions for the pretest and posttest variables were normal as well as when the distributions were positively skewed.
Given that distributions in psychology are frequently non normal Micceri, , it is important that we investigate these procedures under common conditions of non-normality as well as optimal conditions of normal distributions. The alpha level was set to. Therefore, with an alpha level of.
The simulations were conducted with the open-source statistical software R R Development Core Team, Paired-Samples Tests of Equivalence 3. Type I Error Control 3. Normal Distribution. Non Normal Distribution. Power 3. Empirical Examples In order to clarify the nature of the paired-samples equivalence tests, and to demonstrate the inappropriateness of the paired-samples t-test for questions of equivalence, we present two empirical examples. Example 3. In other words, the paired-samples t-test is unable to detect a difference in the means and a researcher might be tempted to conclude that the means are therefore equivalent.
A researcher is interested in the stability of personality traits. The research hypothesis is that optimism is a stable personality trait, and thus optimism scores are expected to be similar from Time 1 to Time 2. Paired-Samples Tests of Equivalence 4. Discussion It is important that researchers use the correct statistical tests for the research questions they address. As equivalence tests become more popular in psychological research, recommendations and guidelines for their appropriate use should be established.
Generally, it is inappropriate to use non-rejection of the null hypothesis in traditional difference-based tests as grounds to conclude the equivalence of means. The current study examined paired samples tests of equivalence developed by Wellek and alternative parametric and nonparametric versions of the two one-sided test procedure proposed by Schuirmann As mentioned previously, this information is typically not available to researchers in psychology.
To summarize, the results of the current study suggest that the TOST-P or NPAR paired samples tests of equivalence are most appropriate procedures with normal distributions, and the NPAR procedure is most appropriate with nonnormal distributions. Further research in this area could focus on expanding the current research to designs where is it desirable to establish equivalence over multiple time points. For example, researchers might be interested in demonstrating that mean depression scores do not differ over multiple follow up investigations e.
Download Free PDF. Paired Samples t-test. Peter Samuels. Mollie Gilchrist. A short summary of this paper. Download Download PDF. Translate PDF. View project All content following this page was uploaded by Peter Samuels on 07 April The user has requested enhancement of the downloaded file. The marks for a group of students before pre and after post a teaching intervention are recorded below. Student Before mark After mark Difference Marks are continuous scale data.
It 9 23 19 -4 depends upon the mean difference, 10 18 20 2 the standard deviation of the 11 14 15 1 differences and the number of cases. For the paired samples t-test to be valid the differences between the paired values should be approximately normally distributed.
To calculate the differences between pre- and post-marks, from the Data Editor in SPSS, choose Transform - Compute Variable and complete the boxes as shown below right. Check the test assumptions The normality of Diff should first be checked — see Checking normality for parametric tests worksheet. There is no evidence for us to suspect that the data is not normally distributed.
0コメント