In one of my earlier posts, I wrote about using tests as a learning instrument – the testing effect. When I still lectures as a means of teaching statistics and research methods, I used weekly open book tests as a way to get students to learn what information their books contained. It worked.
Open-book examinations or tests are defined as testing situations where students can use textbooks, notes, journals, and reference materials during a test (Eilertsen & Valdermo, 2000). A number of studies have identified numerous benefits in the use of open-book tests (c.f. Francis, 1982; Theophilides & Dionysiou, 1996). Some of the advantages include: a reduction in examination tension and stress (Brockbank, 1968; Feldhusen, 1961; Gaudry & Spielberger, 1971; Gupta, 1975; Jehu, Picton & Futcher, 1970; Michaels & Kieren, 1973; Tussing, 1951); a greater learning effect (Michaels & Kieran, 1973); a reduction in rote memorisation (Bacon, 1969; Betteridge, 1971); a reduction in cheating (Feldhusen, 1961; Gupta, 1975; Tussing, 1951); more constructive preparation – depending on the way open-book tests are structured (Feldhusen, 1961); and the promotion of active learning during the testing process (Feldhusen, 1961). There are some disadvantages to open-book testing which include: students wasting time during a test looking up information (Bacon, 1969; Jehu et al., 1970); and the amount of factual information in the study material learned is reduced (Kalish, 1958; Tanner, 1970). Within the research, there are also findings that indicate that there is no difference between open-book and traditional tests. Two of these are that there is no difference in attainment between the two types of tests (Brockbank, 1968; Feldhusen 1961; Kalish, 1958; Jehu et al., 1970); and that examination preparation methods do not differ between the two types of tests – depending on the way open-book tests are structured (Feldhusen 1961).
One of the benefits of open-book tests is that reduced anxiety increases confidence. Although increased confidence does not necessarily lead to better performance (Kalish, 1958; Krarup, Naeraa, & Olsen, 1974), as Pan & Tang (2005) observed, anything that increases students’ confidence in their ability will effectively reduce statistical anxiety (a real problem in the area).
Increasing student engagement with the learning resources provided is another benefit of open-book tests. As students experience difficulty with a subject, they begin a disengagement process and do not try to learn the it. Phillips (1995) found that the use of open-book tests linked to specified textbooks and textbook chapters is effective in increasing student engagement. He observed that through open-book testing, students read and studied their assigned textbook, and the students’ grades increased overall. Improving student engagement with the subject through engagement with the learning resources provided is a good way to both reduce anxiety and improve student performance.
Open-book tests, linked to specific learning resources, reduces stress among students, increases students’ confidence, and increases student engagement with learning resources.
Multiple Tests
Frequent classroom testing is one of the methods that can be used to both improve student satisfaction and improve student performance (c.f. Bangert-Drowns, Kulik, & Kulik, 1991; Glover, 1989; Roediger & Karpicke, 2006). Although normally viewed as a necessary evil to effectively assess student performance, tests can also be used as learning instruments. Tests are usually used infrequently in a class, often only once or twice in a semester (Roediger et al., 2006). When students know that a test is to occur, they will revise and study material in preparation (Bangert-Drowns et al., 1991; Leeming, 2002). The frequency of classroom testing has a direct relationship on the frequency of student revision; more frequent testing leads to more frequent revision.
Using a meta-analysis to look at a number of studies to explore the effects of frequent classroom testing, Bangert-Drowns et al. (1991) found that as the number of tests increase, student performance increases. The effect size diminishes as the number of tests increases. The gains in increased student achievement become increasingly smaller as the number of tests increase.
The effect on student performance is not solely due to the increase in the amount of revision students engage in because of testing, but testing itself has a direct effect on learning. If a student has successfully recalled material for a test, they have a greater chance of remembering it in the future than if they had not been tested. This is called the testing effect and was studied as early as 1917 by Gates. Glover (1989) observed that although the testing phenomenon has been studied extensively in cognitive psychology, there has been very little interest in it from the educational establishment. Indeed, the current mantra in educational settings is a call to reduce the assessment load on students to a minimum, which would preclude the use of tests as instruments of learning.
Roediger et al. (2006) found that students who were repeatedly tested on material as a part of the learning process had better long-term retention of the information than students who were given repeated opportunities to study the material before testing. He set up an experiment where students were given a short passage to study for later testing. In one group, the students had four study sessions, followed by a test. In the second group, the students had a single study session, and then there were three testing sessions, followed by a final testing session (five sessions for each group). In the fifth session, when both groups were tested, the students who were given repeated study sessions performed much higher than the students who had repeated test sessions as a part of their learning process (83% vs. 71%). However, when tested on the same material 1-week later, the students who had repeated testing sessions outperformed the students who had repeated study sessions (61% vs. 40%). Clearly, testing during learning enhances retention.
In addition to increasing student performance, Bangert-Downs et al. (1991) found that overall student satisfaction with a class improves with frequent classroom testing. The effect size for the increase is large with an overall increase of 0.59 standard deviations in student satisfaction for the studies that measured and reported on student satisfaction.
Clearly, the use of multiple testing in a statistics class has the potential to both improve student performance and increase student satisfaction.
Distributed Practice
Another benefit of multiple testing sessions spread across the semester is a more distributed model of learning. The number and spacing of tests during an academic semester will determine how distributed the learning process will be. For more than a century the advantage of distributed learning has been repeatedly demonstrated in memory research (c.f. Dempster, 1996). When learning is compared between massed practice, and distributed practice conditions, there is a marked difference in performance. Short term measures of performance (measures taken immediately after massed practice) produce much better results than distributed practice. However, long-term recall or retention is much better when distributed practice sessions are used than when massed practice sessions are used in learning.
Methods
The performance measure used to judge the success of the innovation was a closed book, two-hour final examination that was identical to the examination that had used in the previous year. Any difference in performance between the two years (309 in the non open-book year, and 287 in the open-book year) would be attributed to the new assessment method. In addition, to gauge student satisfaction, a question asking specifically about the weekly, open-book tests was included on the module evaluation form the students normally complete.
The measure used was student performance in a closed-book multiple-choice examination taken by the students. During the first academic year, the students took a 50 item mid-term examination, and a 40 item final examination. In the second academic year, these two examinations were combined to form an 87 item final examination. Two questions from the mid-term, and one question from the final examination had been eliminated from the first years results following standard item analysis, and so were not included in the second examination.
In addition, comments from the student evaluations from the module specifically relating to the weekly, open-book tests were analysed to provide a measure of student satisfaction for the weekly tests.
Results
Student Performance
The performance for each student was expressed as a percentage score for the purposes of the analysis. The students were ranked according to their performance and then divided into smaller cohorts to examine the differences (using a between groups t-test) across the range of abilities.
Student performance across two years, with the second cohort being required to take weekly open-book exams prior to the closed book final. The percentage scores represent student performance on the closed book, final examination.
Rank
|
2005
|
2006
|
%Increase
|
SD
|
t-test Results
|
Top 10% |
75.6%
|
82.4%
|
6.8%
|
4.0%
|
t(57) = 6.57, p < .001
|
Top Quartile |
70.6%
|
76.3%
|
5.7%
|
5.8%
|
t(145) = 6.37, p < .001
|
Middle 50% |
56.0%
|
60.3%
|
4.3%
|
5.1%
|
t(297) = 7.48, p < .001
|
Bottom Quartile |
40.9%
|
43.2%
|
2.3%
|
5.7%
|
t(148) = 2.62, p = .009
|
Bottom 10% |
36.6%
|
37.4%
|
0.8%
|
3.6%
|
t(59) = 0.74, p = .47
|
Overall |
55.8%
|
60.0%
|
4.2%
|
12.9%
|
t(594) = 4.16, p < .001
|
The results from the student satisfaction questions showed overwhelming support for the weekly open-book tests:
Responses to the end of semester module evaluation are tabulated, along with some of the phrases used to describe the weekly, open-book tests.
Rating |
Example Adjectives |
Number |
Excellent |
Brilliant, Excellent, Very Good |
70
|
Good |
Good, Liked them |
63
|
Neutral |
Made me read |
5
|
Negative |
Too early in the morning, Hated them |
2
|
Introducing weekly, open-book tests, accomplished several positive outcomes for our students. The measurable outcomes were higher grades, with a disproportionate benefit accrued to higher performing students, and greater overall student satisfaction. Although we have not measured it, because of the cognitive benefits of distributed practice, we are hoping that the learning is long term, and will stay with the students throughout their studies. Again, we have not directly measured statistical anxiety, but feel that the innovation has reduced statistical anxiety among our students and fostered increased engagement in the subject to a greater degree than in the past. These results suggest a broad adoption of the model would greatly benefit students studying statistics.
I used this system for about five years before moving to a different method of teaching and assessing statistics and research methods. I think it worked out well for both the students and me as their instructor.
References
Bacon, F. (1969). Open-book examinations. Education and Training, 9, 363.
Bangert-Drowns, R. L., Kulik, J. A., & Kulik, C. C. (1991). Effects of frequent classroom testing. Educational Research, 85 (2), 89 – 99.
Betteridge, D. (1971). Open-book exams. Education in Chemistry, 8 (2), 68 – 69.
Brockbank, P. (1968). Examining Exams. Times Literary Supplement, 25th July.
Dempster, F. N. (1996). Distributing and managing the conditions of encoding and practice. In E. L. Bjork & R. A. Bjork (Eds.), Memory: Vol. 10. Handbook of Perception and Cognition. (pp. 317 – 344). New York: Academic Press.
Eilertsen, T., & Valdermo, O. (2000). Open-Book assessment: A contribution to improved learning? Studies in Educational Evaluation,26 (2), 91-103.
Feldhusen, J. F. (1961). An evaluation of college students’ reactions to open-book examinations. Educational and Psychological Measurement, 21, 637 – 646.
Francis, J. (1982). A case for open-book examinations. Educational Review, 34 (1), 13-26.
Gates, A. I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 6 (40).
Gaudry, E. & Spielberger, C. D. (1971). Anxiety and Examining Procedures. New York: Wiley & Sons.
Glover, J. A. (1989). The “testing” phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81 (3), 392 – 399.
Gupta, A. K. (1975). Open-book examinations in India – some reflections. In A. K. Gupta (ed.) Examination Reform, Directions research and Implications. New Delhi: Sterling.
Jehu, D., Picton, C. J., & Futcher, S. (1970). The use of notes in examinations. British Journal of Educational Psychology, 40, 335 – 337.
Kalish, R. A. (1958). An experimental evaluation of the open examination. Journal of Educational Psychology, 40, 200 – 204.
Krarup, N., Naeraa, N., & Olsen, C. (1974). Open-book tests in a university course. Higher Education, 3, 157 – 164.
Leeming, F. C. (2002). The exam-a-day procedure improves performance in psychology classes. Teaching of Psychology, 29, 210 – 212.
Michaels, S. & Kieren, T. R. (1973). Investigation of open-book and closed-book examinations in mathematics. Alberta Journal of Educational Research, 19, 202 – 207.
Pan, W. & Tang, M. (2004). Examining the effectiveness of innovative instructional methods on reducing statistics anxiety for graduate students in the social sciences. Journal of Instructional Psychology, 31, 149-159.
Phillips, G. (1995). Using open book tests to encourage textbook reading in college. Journal of Reading, 38 (6), 484.
Roediger, H. L. III, & Karpicke, J. D. (2006). Test enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17 (3), 249 – 255.
Theophilides, C., & Dionysiou, O. (1996). The major functions of the open-book examination at the university level: A factor analytic study. Studies in Educational Evaluation, 22 (2), 157-70.
Tussing, L. (1951). A consideration of the open-book examination. Educational and Psychological Measurement, 2, 597 – 602.
Tanner, L. (1970). Performance in open-book tests. Journal of Geological Education, 18, 166 – 167.