PrelimCh3test

Chapter 3: Methodology

The Pilot Study

In the fall of 1998, I taught two sections of first-semester calculus for one week at Richland Community College (RCC). Each section contained 12 students. All 24 subjects signed consent form given in Appendix A. The night section used motion sensors to work through an instructional unit (Appendix A) that relies on graphs to introduce the concept of the derivative. The day section worked through the same instructional unit, but produced graphs by using a Java applet that I wrote, available at https://mste.illinois.edu/murphy/MovingMan/default.html. The applet draws a stick figure, which a student can drag back and forth across the top of the computer screen. Below the figure, the applet produces a graph similar to the graph produced by the motion sensor software. The applet allows the student the same control over the motion that is available with the motion sensor, but reduces the kinesthetic component of the experience from whole-body motion over a few meters to the motion of one hand over a few centimeters. The use of identical exercises in the two sections minimized other differences. In both sections, students worked in groups, choosing their own partners. Due to equipment restrictions, most students using the motion sensors worked in groups of three. The remaining students in that section, like all of the students in the Java applet section, worked in pairs.

Subjects were tested before the instruction began and after it was concluded. The tests (Appendix B) were based on the Test of Graphing in Science (TOGS) (McKenzie & Padilla, 1986), which has been used in several studies on graph instruction. It is designed to measure graph interpretation skills. Questions were added to measure mastery of the calculus concepts covered in the sessions and attitude toward mathematics. Attitudinal questions were taken from SIMS (Second International Math and Study). Subjects went to the RCC Testing Center to take the tests at their own convenience. This saved class time and was convenient for the instructors, the students, and the researcher. Unfortunately, some students did not take the post-test, and this was not realized until after it was too late to remedy this omission.

In the analysis that follows, I use alpha = 0.10 for significance. I have chosen this loose standard because this is a pilot study. The main purpose of this work is to identify areas that merit greater attention in the main study. By using a large value for alpha, I increase the number of areas that get attention in this analysis, and thus reduce the likelihood of overlooking something that may be important to the main study. In the main study I will use the more conventional value of alpha = 0.05.

Achievement Data

Tests for preexisting achievement differences between sections
Since I worked with intact classes, rather than randomly assigning subjects to treatments, it is reasonable to consider whether the classes were significantly different in ways that would affect the students’ performance on the assessment instruments used in this study or their ability to benefit from the instruction presented. One of the classes met during the day, and the other during the evening. Typically, day classes contain students of traditional student age (18 to 22) who are enrolled in college full-time (at least 12 hours per semester) and work at most part-time. In contrast, night students are more likely to be older, working full time, and taking only one or two courses per semester. Since the two groups are known to be different is a variety of ways, it is necessary to carefully consider all evidence of preexisting differences between the groups.

Although Richland does not require its students to take the ACT or SAT exams, ACT results were available for fifteen of the 24 subjects, six from the night section and nine from the day section. That fewer night students than day students have taken the ACT reflects the demographics of the two groups. Night students often have been out of high school for several years. In their junior year of high school, when the ACT is usually taken, many of these students were not planning to go to college and so did not take the ACT. In contrast, day students are more likely to be recent high school graduates who plan to go on for bachelors’ degrees, and thus are more likely to have taken the ACT exam. The instructor of the night class noted that he had more “day” students in that class than usual, which may account for the fact that half of the students in the night class did have ACT scores on file.

The ACT scores include subscores for English, Math, Reading, and Scientific Reasoning, along with a Composite score. I used these scores to test the hypothesis that the two sections were similar in prior achievement as measured by the ACT test. Since night students and day students are drawn from two different populations, and thus may have a different variances on some or all of these scores, I used two-sample t-tests rather than pooled t-tests. Independent two-sample t-tests with a per-contrast¹ error rate of a = 0.10 failed to find significant differences between sections on any of these scores.

The lowest p-value (p = 0.1279) was for the comparison of math scores, with the night students scoring slightly higher than the day students. This suggests that with a larger sample one might find a difference in math scores between day and night students, but this is far from certain. This difference, if it exists, could be due to a difference in math achievement between day and night students, or it could be due to the fact that most day students took the ACT, while only half of the night students did so. It may be that those night students most likely to take the ACT are also those most likely to achieve high scores, so that the “top” half of the night class is being compared to the majority of the day class, biasing the results in favor of higher scores for the night class.

In order to enroll in first semester calculus, RCC students must either earn grades of C or higher in both College Algebra and Trigonometry at RCC or score appropriately on level 3 of the RCC placement exam. Six of the night students and seven of the day students had entered the course as a result of taking the level 3 placement exam. I used these scores to test the hypothesis that the two sections were similar in prior achievement as measured by the placement test. A two-sample t-test with a = 0.10 failed to find significant differences between the day section placement test score mean of 61.86 and the night section mean of 63.50. For students not taking the placement test, grades earned in College Algebra and Trigonometry were nearly identical between the two sections.

Five of the twelve night students and three of the twelve day students were repeating the course. The other seven night students and nine day students were taking first-semester calculus for the first time at the college level, although several may have taken a calculus course in high school. Experience suggests that more day students than night students will have recent calculus experience from their high school work.² Since the number of day students repeating the course is slightly smaller than the number of night students repeating the course, and the number of day students who have taken calculus in high school is expected to be slightly larger than the number of night students who have taken calculus in high school, it appears likely that roughly similar numbers of students in each class have had some prior exposure to the concept of derivative.

Prior to receiving instruction, all subjects took a pre-test on graphing concepts, with emphasis on interpreting graphs of distance vs. time and velocity vs. time. I used these scores to test the hypothesis that the two sections were similar in prior knowledge as measured by the pre-test test. A two-sample t-test with a = 0.10 failed to find significant differences between the day section pre-test mean of 18.75 and the night section mean of 19.92.

In short, the ACT scores, placement tests, prior course grades, proportions of students entering the course via the placement test as opposed to prerequisite courses, course repeating patterns, and pre-tests all failed to uncover significant differences between the two sections. The slight differences found, although not significant, generally appeared to favor the night class, which had slightly higher ACT scores, placement test scores, and pre-test scores than did the day class. This agrees with the conventional wisdom among community college instructors, which holds that one sees slightly stronger students, on average, in evening classes than during the day. If the study were repeated with larger samples, I would not be surprised to find that the night population had a higher prior achievement level, as measured by the ACT, placement test, and pre-test scores, than the day class, but from the limited data here no such conclusion can be drawn.

Tests for differences in overall achievement produced by instruction
The most obvious data to test for possible differences produced by the instruction are the post-test scores. A two-sample t-test with a = 0.10 found that the night class post-test mean of 21.18 was significantly greater than the day class mean of 19.33 (p < 0.10). However, this does not mean as much as it might first appear. Despite being told that their instructor intended to use the post-test for extra credit in the course, only half of the twelve day students took the post-test. By contrast, all but one of the twelve night students took the post-test. This may be due to the reputed reliability and work ethic of the older, working students in the night class, or it may be that the instructor in the night class reminded the students of the post-test more often than did the instructor in the day class.³ Whatever the reason, the six day students who took the post-test cannot be regarded as a random sample of the day class, since it is likely that the students who chose to take the post-test were either more diligent or felt more in need of extra credit than the ones who did not take the test. This self-selection factor alone could account for the difference in post-test scores.

To examine the hypothesis that the day students who did take the post-test differed in some relevant way from the day students who did not take the post-test, I compared pre-test scores, placement exam scores, grades in prerequisite courses, and ACT scores of these two groups of students. The groups appeared similar with regard to grades, placement exams, and pre-tests, but on the ACT scores there was a significant difference. The students who took the post-test had higher ACT scores than those who did not. The difference was significant at a per-contrast error rate of alpha = 0.10 in every area of the ACT test except mathematics (see Table 1). If the four subscores and the composite score were independent, which they are not, one could control the familywise error rate for these 5 comparisons at 0.10 by using alpha = 0.02 on each contrast. The lack of independence makes this method inaccurate. However, using alpha = 0.02 as an approximate figure, we can see that the Reading and Scientific Reasoning score differences are still significant, while the Math and English differences are not. The Composite score is not significant at alpha = 0.02, but would be significant at alpha = 0.03, so it is the one that is most sensitive to the inaccuracy in this method of computing alpha.

Table 1: Comparison of ACT scores of day students who took the post-test to day students who did not take the post-test.

Category Took Post-Test Did Not Take Post-Test p-value Significant
English mean = 25.00
s.d. = 3.16 mean = 18.00
s.d. = 7.07 < 0.10 Yes

Math mean = 25.57
s.d. = 3.30 mean = 21.80
s.d. = 4.49 > 0.10 No

Reading mean = 31.25
s.d. = 4.19 mean = 19.60
s.d. = 4.67 < 0.01 Yes

Scientific
Reasoning mean = 28.50
s.d. = 1.73 mean = 22.00
s.d. = 3.32 < 0.01 Yes

Composite mean = 27.75
s.d. = 0.96 mean = 20.40
s.d. = 4.98 < 0.05 Yes

Comparisons of post-test scores and ACT scores for the 10 subjects who took both of these tests show very weak correlations between the various ACT subscores and the post-test score. This is displayed in Figures 1-5, where circles represent day class students and slashes represent night class students. The only one of these correlations that might approach significance is between post-test scores and ACT Math subscores, which is the one area of the ACT for which the group taking the post-test did not differ significantly from the group not taking the post-test. Since ACT score is not correlated to post-test score, it appears that ACT score is not useful as an indicator of the direction of the effect of the self-selection bias in post-test score.

The strongest correlation between the post-test scores and any of the prior achievement data (ACT scores, placement test scores, prerequisite course grades, or pre-test scores) was with the pre-test scores (see Figure 6). However, there was little difference in pre-test score between those who took the post-test and those who did not. The day students who took the post-test had a mean pre-test score of 18.5, which was not significantly different from the mean of 20.5 for those who did not take the post-test.

Figure 1: Post-test score vs. ACT English Figure 2: Post-test score vs. ACT Math
subscore, with regression line. Pearson subscore, with regression line. Pearson
product-moment correlation 0.159. product-moment correlation 0.517.

Figure 3: Post-test score vs. ACT Reading Figure 4: Post-test score vs. ACT Scientific
subscore, with regression line. Pearson Reasoning subscore, with regression line. product-moment correlation 0.-0.170. Pearson product-moment correlation 0.140.

Figure 5: Post-test score vs. ACT Composite Figure 6: Post-test score vs. Pre-test score,
score, with regression line. Pearson with regression line. Pearson product-
product-moment correlation 0.-0.170. moment correlation 0.560.

Since the students in the day class who did take the post-test had significantly higher ACT scores than those who did not, one might expect that, if the self-selection bias had any effect at all, it would raise the day class post-test mean above what it would have been had all day class students taken the post-test. On the other hand, the correlation between ACT score and post-test score is weak to non-existent. The strongest correlation is between pre-test score and post-test score. Since the group that did not take the post-test had scored slightly higher on the pre-test than the group that did take the post test, this suggests that the day class mean on the post-test might have been higher if all students had taken the post-test, rather than just half the class taking it. However, the pre-test score difference between these two groups was not significant. These two indicators, both weak at best, point in opposite directions, leaving doubt as to the direction of the effect, if any, of the self-selection bias on the post-test scores.

For another take on the post-test data, I compared changes in score from the pre-test to the post-test for those students who did take both tests. Since the pre-test was the prior achievement measure that best predicted the post-test score, examining the difference should allow me to find that part of the post-test score attributable to the instruction rather than to prior achievement.⁴ A two-sample t-tests with alpha = 0.10 failed to find significant differences between the day class mean gain of 0.83 and the night class mean gain of 0.82. This suggests that the significant difference in post-test scores discussed above was a result of the self-selection bias created when half of the day class chose not to take the post-test.

In short, I have not found any differences in overall achievement that can be attributed to the instruction. In the main study, I will have a larger and more homogeneous sample, random assignment to groups, ACT scores for all subjects, an improved assessment instrument, and, it is hoped, a larger proportion of the sample taking both pre- and post- tests. This should better enable me to find differences produced by the instruction, if such differences exist. It remains possible that the two forms of instruction are equivalent.

Achievement Instrument Items
The first eleven questions on each of the pre-test and post-test were taken from TOGS. Question 12 was adapted from one used by Mokros and Tinker (1987). I composed the remaining 12 pairs of questions myself. Each of the 24 questions on the post-test corresponds directly to one of the 24 questions on the pre-test. In most cases, the pairs of corresponding questions have the same number. For example, question 10 on the pre-test is very similar to question 10 on the post test. However, there are three question pairs for which the numbers are different. Question 2 on the pre-test corresponds to question 4 on the post-test. In what follows, that question pair will be referred to as question 2/4. In the same way, question 3/2 refers to question 3 on the pre-test and question 2 on the post-test, which form a pair of similar questions, and similarly for question 4/3.

As discussed earlier, it is not clear what factors may have caused any observed differences between sections. The self-selection bias introduced when only half of the day class took the pre-test may account for any or all differences. For that reason, in the following analysis of the responses to individual questions I will give the bulk of my attention to the overall change for all subjects between pre-test and post-test. Differences between the two sections will be noted, but not considered in depth.

The following questions on the achievement pre-test were answered correctly by at least 22 of the 24 subjects: 3/2, 4/3, 5, 9, 14, 15, 18, 22. The following achievement pre-test questions were answered correctly by at least 16 of the 17 subjects who later took the post-test: 2/4, 3/2, 4/3, 5, 7, 14, 15, 17, 18, 22. These questions indicate areas that the subjects appear to have mastered. As a result, these questions did not allow improvement to be measured in these areas.

Questions 1, 6, 8, and 10 ask the student to select the most appropriate set of axes for displaying the given data. The available options differ in the scales and labels on the axes and in which variable is assigned to which axis. This is not directly related to the content of the instruction, but does test knowledge necessary for work with graphs. On the pre-test, the 24 subjects averaged 53% correct on these four questions, as opposed to 86% for the other 20 questions. (The 17 subjects who went on to take the post-test performed similarly to the whole sample, scoring 54% correct on these four questions and 88% correct on the other 20 questions.) These scores suggest that, prior to the instruction provided with this study, the students lacked understanding of the significance of the axes, labels, and scales, or perhaps were not giving attention to these features of graphs. The post-test scores showed improvement on all four of these items, with significant improvement on questions 6 (p < 0.001), 8 (p < 0.05), and 10 (p < 0.05). On the post-test, the subjects averaged 85% correct on these four questions, as compared to 86% for the other 20 questions. This indicates greater learning in the area of scales, labels, and meaning of axes than in the other areas tested. This is somewhat surprising, since these concepts were not explicitly covered in the instruction. These questions, possibly with minor variations, will be used in the next version of the test. There was no significant difference between the two sections in improvement on any of these four questions.

Questions 2/4, 3/2, 4/3, and 5 relate to a graph of five points plotted on a pair of axes. These questions ask about individual points represented in the graph, and ask students to estimate the y-values for x-values that do not correspond to any of the five plotted points. On the pre-test, 21 of the 24 subjects answered question 2/4 correctly, and all 24 subjects answered questions 3/2, 4/3, and 5 correctly. This indicates that the subjects had a good understanding of individual points plotted on graphs. This is to be expected, since graphing instruction often includes substantial time spent plotting points. In my own teaching experience, I have observed that many students would rather plot 20 points to get a graph than use other information, such as the vertex of a parabola or the asymptotes of a hyperbola. Plotting points is familiar, so they feel more comfortable and confident with it. Further instruction cannot be expected to produce much improvement in this area, since the subjects apparently have already mastered the skills tested in these questions. Since the instruction provided as part of this study involves graphs of curves, not discrete points, these questions are not directly related to the instruction. For these reasons, questions 2/4, 3/2, 4/3, and 5 will not be used in the next version of the tests.

Question 7 gives four graphs of lines or connected segments through sets of points, and asks which is the most appropriate best-fit line. Twenty-one of the 24 subjects, including 16 of the 17 who went on to take the post-test, answered this item correctly. This does not imply that they knew anything about best-fit lines, since a simple definition of best-fit line was given in the question. It does indicate that they can understand and apply the definition. The instruction did not involve lines through sets of points, nor did it involve the concept of best-fit, so it is not to be expected that the instruction would improve student performance in this area. Since only one of the subjects who took the post-test missed this question on the pre-test, there is not much room to show improvement even if the unit did cover best-fit lines. On the post-test, two students missed this question. As expected, this is not significantly different from the results of the pre-test. Since the subjects appeared to have grasped this concept before the instruction, and since the instruction does not cover this subject, this question will not be used on the next version of the test.

Question 9 asks the subjects to select the statement that best describes the relationship between two curves drawn on the same set of axes. Twenty-two of the 24 subjects, including 15 of the 17 who later took the post-test, answered this question correctly on the pre-test. This indicates that these students are able to understand relationships between graphs. This is important, because in the calculus course they will need to understand the relationships between graphs of functions and graphs of the derivatives of the functions. All 17 of the subjects who took the post-test answered this question correctly, which suggests that the instruction might have helped the students somewhat in this area. This is to be expected, since this item is directly related to the instruction, where the students compared distance and velocity graphs and studied their relationship. Thus, some question or questions along this line should be included in the next version of the test. However, the results indicate that this question was not sufficiently challenging.

Questions 11 and 13 provide descriptions of situations and ask students to select graphs with appropriate shapes to fit the situations. In problem 11, each choice is a pair of graphs with no scales given on any of the axes. The question describes the relationship between the two dependent variables. Twenty-one of the 24 subjects, including 15 of the 17 who went on to take the post-test, answered this question correctly on the pre-test. All 17 of the subjects who took the post-test answered this question correctly. This agrees with the result of question 9, which indicated that the subjects were able to understand the relationships between graphs, and that the instruction seemed to improve their performance in this area. As with question 9, a more challenging version of question 11 should be written for the next version of the test.

In problem 13, each choice is a single graph depicting the relationship between distance and time of a moving person. This relates very directly to the instruction, yet the students performed much better on this item before instruction. Significantly fewer (p < 0.01) students answered question 13 correctly on the post-test than on the pre-test. This surprising result may be due to a slight difference between the pre-test and post-test versions of the question. The pre-test version asked for a graph of distance from the starting point, while the post-test version requested a graph of distance from the goal. The most popular incorrect answer on the post-test shows a correct graph of the distance from the starting point. If the difference in the versions of the question is responsible for the difference in the students’ performance, then it appears that choice of reference point in the graph strongly influences the students’ ability to interpret the graph. Motion away from the reference point appears easier to understand than motion toward the reference point. In an effort to remedy this problem, the instructional unit will be revised to use many examples of both types of graphs. When the test is revised, this question will be modified so that both pre-test and post-test versions involve graphs of distance from the same point. Other questions will be added to use other starting points.

There was a significant (p < 0.05) difference between sections in changes from pre-test to post-test on question 13. All six of the day students who later took the post-test answered question 13 correctly on the pre-test. Only one of those answered it correctly on the post-test. The night section also showed worse performance on the post-test than on the pre-test, but the change was not as large. Eight of the eleven night students who later took the post-test answered question 13 correctly on the pre-test; five of those answered it correctly on the post-test. As noted, it is impossible to be certain that any differences between sections are due to the instruction, but it may be that the physical motion impresses upon the student the relevance of the starting point more thoroughly than does the work with the mouse, so that the night students were less affected by the change in the question.

Question 12 was adapted from a question used by Mokros and Tinker (1987). This item is intended to check for graph-as-picture (or road map) misinterpretations. Three of the 17 subjects who went on to take the post-test missed this item on the pre-test. Only one missed it on the post-test. This improvement was not statistically significant. While it this question did not reveal much use of graph-as-picture interpretations, it is possible that more challenging questions will uncover this misinterpretation in use. Since other studies have shown that the graph-as-picture misinterpretation is common, and since this is an area in which the motion sensor instruction has been shown to be useful, this question will be retained in the next version of the test, and other, more challenging, questions on graph-as-picture will be added.

Questions 14 through 18 refer to a graph of distance vs. time for a moving object. This relates directly to the instruction, so some questions on this topic will be used on the next version of the test. However, since all 24 subjects answered questions 14 and 15 correctly on the pre-test, and 22 of the 24 answered question 18 correctly, it appears that more challenging questions should be asked. There was no significant change on either question 16 or question 17.

Questions 14 and 15 require the subject to interpret the information given in the right endpoint of the graph. As shown by the results of questions 2 through 5, these subjects were skilled in interpreting individual points on a graph, so it comes as no surprise that they were all able to answer these questions correctly. They found questions 16 through 18, which involve the slope of the curve at various points, to be only slightly more challenging. On both questions 17 and 18, 16 of the 17 subjects who later took the post-test gave the correct answer on the pre-test. Question 16 appears to have been the most difficult, with only 13 of the 17 who went on to take the post-test giving the correct answer on the pre-test. Sixteen of these 17 subjects answered question 16 correctly on the post-test. This indicates that the subjects had a reasonably good understanding of the meaning of the slope of a curve before receiving the instruction. Instruction may have been helpful, but more challenging questions would be needed to determine whether this is true.

Questions 19 through 24 refer to a graph of velocity vs. time for a moving person. This relates directly to the instruction, so some version of these questions will be used on the next version of the test. Post-test scores were significantly higher than pre-test scores for questions 23 (p < 0.1), and 24 (p < 0.005), and significantly lower for questions 20 (p < 0.1) and 22 (p < 0.001).

Question 19 asks the student to determine from the velocity graph the number of times that the moving person changed direction. Of the 17 students who took both pre-test and post-test, 14 answered this question correctly on each test. Question 21 gets at the change of direction in a different way, by asking the student to identify the point at which the person was farthest from the starting point. Of course, the correct answer is that the moving person was farthest from the starting point at the time when s/he turned around to begin the return trip. Although question 21 is less direct, more subjects answered it correctly than answered question 19 correctly, with 15 of the 17 subjects who took both tests answering question 21 correctly on each test. The lack of improvement on these two items is disturbing, since the instruction was expected to improve students’ ability to interpret significant features of the velocity graph. The instruction will be revised to include more work with velocity in both directions, and more questions on this topic will be added to the test.

Question 20 asks the students to identify the point on the graph corresponding to the fastest motion. On the pre-test, the correct answer involved motion in the positive direction, while the post-test correct answer involved motion in the negative direction. This may have made the post-test question more difficult. However, even combining the correct answers with the answer which identifies the fastest motion in the positive direction, the post-test scores on question 20 are not as high as the pre-test scores.

Questions 22, 23, and 24 each give a time and ask the student to choose the description of the action that is most likely to have been occurring at that time. To answer these questions, the student has to be able to determine the relative velocity (fast vs. slow) and the change in velocity (speeding up vs. slowing down) from the shape of the graph. Questions 23 and 24 showed significant improvement in students’ performance from pre-test to post-test, while question 22 showed the reverse. This discrepancy may be due to a typographical error in the post-test version of question 22, which made the correct answer a little less clear than it was intended to be. Combining the number of correct responses with the number of responses giving the answer that was made more plausible by the typo still leaves fewer correct answers on the post-test than on the pre-test, but the difference is not significant. Taking these three questions together, it appears that the instruction helped the students learn to better interpret slope and concavity on velocity graphs.

There was a significant (p < 0.10) difference between sections in changes from pre-test to post-test on question 22. All six of the day students who later took the post-test answered question 22 correctly on the pre-test. Only one of those answered it correctly on the post-test. The night section also showed worse performance on the post-test than on the pre-test, but the change was not as large. Ten of the eleven night students who later took the post-test answered question 22 correctly on the pre-test; six of those answered it correctly on the post-test. It is impossible to say what may result when the typo and the self-selection bias are removed, but this is certainly an area to watch carefully in the main study.

Taking questions 20 and 22 together and examining the patterns in the incorrect answers suggests that several students were interpreting the given graph of velocity vs. time as if it were a graph of distance vs. time, so that they selected points of high slope when points of high absolute value were appropriate. This error has been documented in the literature under the name “slope/height confusion,” although it is not clear that students are actually confusing slope and height. The instruction involved more work with distance than with velocity, which may have induced the students to treat all graphs as distance graphs. The instruction will be revised to provide more work with the relationship between distance and velocity, with the intention of helping students to usefully distinguish between the two. The flaws in questions 20 and 22 will be removed, and the questions will be used on the next version of the test, perhaps with a few more questions in the area of the so-called “slope/height confusion.”

Questions 23 and 24 showed significant improvement from pre-test to post-test. For question 23, this should be interpreted with caution, since the two questions were not quite alike. The pre-test asked about a point where the motion was in the negative direction and slowing down, while the post-test asked about a point where the motion was stopped. As previously noted, the students may find motion in the negative direction confusing, so the pre-test version of question 23 may have been more difficult than the post-test version. However, the two versions of question 24 are extremely similar, so the improvement found probably is a real improvement.

There was a significant (p < 0.01) difference between sections in changes from pre-test to post-test on question 24. All six of the day students who later took the post-test answered question 24 incorrectly on the pre-test. Five of the six answered it correctly on the post-test. The night section also showed better performance on the post-test than on the pre-test, but the change was not as large, in part because most of the night students answered this question correctly on the pre-test. Nine of the eleven night students who later took the post-test answered question 24 correctly on the pre-test; all eleven answered it correctly on the post-test. It appears that this difference is due to a preexisting difference between sections, rather than a difference produced by the instruction.

Student performance in several areas is summarized in Table 2. The changes from the original test to the next version will be as shown in Table 3. The next version of the test, with pre-test and post-test versions of all questions, should be given to a population as similar as possible to the population of the main study, so that accidental dissimilarities between paired questions can be discovered and corrected.

Table 2: Student performance as measured by the pre- and post-tests.

Topic Initial Change Questions

Scale, labels, indep. vs. dep. variables Poor Significant improvement 1, 6, 8, 10

Interpreting plotted points Excellent No significant change 2/4, 3/2, 4/3, 5

Best-fit lines Very good No significant change 7

Relating two graphs Good Slight improvement 9, 11

Graph-as-picture Fair Slight improvement 12

Shape of graph Good Uncertain 11, 13

Features of distance graph Excellent No significant change 14 - 18

Features of velocity graph Fair Uncertain 19 - 24

Table 3: Proposed changes to achievement test questions.

Question
Number Pre-test
# correct Change Reason

1 12 Keep or revise Relevant topic, but no significant change

2/4 21 Delete Not relevant, too easy

3/2 24 Delete Not relevant, too easy

4/3 24 Delete Not relevant, too easy

5 24 Delete Not relevant, too easy

6 9 Keep Relevant, showed significant change

7 21 Delete Not relevant, too easy

8 11 Keep Relevant, showed significant change

9 22 Revise Relevant, but too easy

10 19 Keep or revise Relevant, showed significant change,
but too easy

11 21 Revise Relevant, but too easy

12 20 Keep Very relevant

13 19 Revise Relevant, but pre-test version too easy and
not similar enough to post-test version

14 24 Revise Relevant, but too easy

15 24 Revise Relevant, but too easy

16 20 Revise Relevant, but too easy

17 21 Revise Relevant, but too easy

18 22 Revise Relevant, but too easy

19 19 Keep Very relevant

20 18 Revise Relevant, but pre-test version not similar
enough to post-test version

21 20 Keep Very relevant

22 22 Revise Relevant, but pre-test version too easy and
post-test version contains typo

23 13 Revise Relevant, showed significant change, but
pre- and post-test versions too dissimilar

24 14 Keep Relevant, showed significant change
Add questions on: relationship between two graphs, especially function and its derivative; graph-as-picture; “slope/height confusion”; distance vs. time; velocity vs. time; relationship between distance and velocity.

Category	Took Post-Test	Did Not Take Post-Test	p-value	Significant
English	mean = 25.00 s.d. = 3.16	mean = 18.00 s.d. = 7.07	< 0.10	Yes
Math	mean = 25.57 s.d. = 3.30	mean = 21.80 s.d. = 4.49	> 0.10	No
Reading	mean = 31.25 s.d. = 4.19	mean = 19.60 s.d. = 4.67	< 0.01	Yes
Scientific Reasoning	mean = 28.50 s.d. = 1.73	mean = 22.00 s.d. = 3.32	< 0.01	Yes
Composite	mean = 27.75 s.d. = 0.96	mean = 20.40 s.d. = 4.98	< 0.05	Yes

Topic	Initial	Change	Questions
Scale, labels, indep. vs. dep. variables	Poor	Significant improvement	1, 6, 8, 10
Interpreting plotted points	Excellent	No significant change	2/4, 3/2, 4/3, 5
Best-fit lines	Very good	No significant change	7
Relating two graphs	Good	Slight improvement	9, 11
Graph-as-picture	Fair	Slight improvement	12
Shape of graph	Good	Uncertain	11, 13
Features of distance graph	Excellent	No significant change	14 - 18
Features of velocity graph	Fair	Uncertain	19 - 24

Question Number	Pre-test # correct	Change	Reason
1	12	Keep or revise	Relevant topic, but no significant change
2/4	21	Delete	Not relevant, too easy
3/2	24	Delete	Not relevant, too easy
4/3	24	Delete	Not relevant, too easy
5	24	Delete	Not relevant, too easy
6	9	Keep	Relevant, showed significant change
7	21	Delete	Not relevant, too easy
8	11	Keep	Relevant, showed significant change
9	22	Revise	Relevant, but too easy
10	19	Keep or revise	Relevant, showed significant change, but too easy
11	21	Revise	Relevant, but too easy
12	20	Keep	Very relevant
13	19	Revise	Relevant, but pre-test version too easy and not similar enough to post-test version
14	24	Revise	Relevant, but too easy
15	24	Revise	Relevant, but too easy
16	20	Revise	Relevant, but too easy
17	21	Revise	Relevant, but too easy
18	22	Revise	Relevant, but too easy
19	19	Keep	Very relevant
20	18	Revise	Relevant, but pre-test version not similar enough to post-test version
21	20	Keep	Very relevant
22	22	Revise	Relevant, but pre-test version too easy and post-test version contains typo
23	13	Revise	Relevant, showed significant change, but pre- and post-test versions too dissimilar
24	14	Keep	Relevant, showed significant change

Back to the Table of Contents for this paper.

Back to Lisa's Papers page.

Back to Lisa's Academic Activities page.

Back to Lisa's personal home page, Over the Rainbow.

Send Lisa email.

This page last revised January 18, 2000