From Text to Audio and Back Again: Providing Students with Good Feedback

Here is a short report I published in 2009 around the use of voice recognition software to provide feedback to students on their work. It worked, and has been adopted by a number of lecturers in my institution.

In recent years, a number of educational innovators have been experimenting with providing students with feedback on their work using audio podcasts. A number of problems with this approach have been identified. Using voice recognition software to turn speech into text, the speed and ease of providing high quality audio feedback can be combined with the advantages of printed text, giving the students the best of both worlds.

Since the National Student Survey began four years ago, feedback on assessment has been one of the lowest scoring categories (Attwood, 2009). One of the hoped for answers to the problem has been the use of audio podcasts to provide students with feedback on both their formative and summative work. A number of high profile research projects into the use of audio podcasting to provide students with feedback (e.g. Sounds Good; Audio Supported Enhanced Learning) have been undertaken in recent years.

Generally, the evaluation of providing audio feedback to students has been positive (Ice, Curtis, & Phillips, 2007; Merry & Orsmond, 2008; Rotheram, 2009). The students have enjoyed a more in-depth experience and appreciate the detailed explanations the audio recordings provide. In addition, they feel that the audio recordings are more personal, and are evidence of their tutor’s engagement with the process. Some of the challenges students identified included a reluctance to listen to feedback in public places (even with earphones), reluctance to mix feedback recordings with music on personal players, referencing particular parts of a paper in the feedback, and returning to specific points made in the feedback.

Staff members involved in the initial work have also responded positively to the experience as well (Ice et al., 2007; King, McGugan & Bunyan, 2008; Rotheram, 2009). They report that most of the initial users intended to continue with the method. However, there were some real limitations identified by the lecturers (Merry & Orsmond, 2008). The biggest problems surrounded the administration of the files (loading, naming, and distributing to students). Although the reports were generally positive, there is real doubt about the scalability of the method. Rotheram reported that one of the lecturers who decided not to continue using audio recordings didn’t feel it was practical with 80 students. Because of the administrative difficulties, the users didn’t report a time savings associated with the use of audio files.

Although the advantages of using audio files are worth pursuing, the problems, especially when dealing with large groups of students, are very real (King, McGugan & Bunyan, 2008; Rotheram, 2009). However, most of the problems are specific to the media used. Although much easier to use than previously, sound recordings are not the same as text files and paper.

However, there are significant advantages to using audio for student feedback. The time savings are significant (in the simple recording of feedback – see Merry & Orsmond, 2008), the depth and scope of the feedback is greater, and the feedback can be explained and understood, as opposed to illegible scrawling in the margins (Ice et al., 2007; King et al., 2008; Rotheram, 2009). The problems appear to be limited to the manipulation and administration of audio files, not in the audio recordings themselves.

My recent work in combining printed text feedback with audio recordings has demonstrated that the advantages of both types of feedback can be realised. Voice recognition software enables real time translation of speech into text, allowing a lecturer to provide feedback in the same manner as making an audio recording, taking advantage of the speed, depth and scope, and explanatory power of the recording, while producing a text file that can be distributed to the students in the same way a recording might be or, more simply, by printing and stapling the paper copy to the students’ work.

Method of Use

For this pilot, an Apple Power Mac (dual 2.8 GHz quad core, OS 10.5) running MacSpeech Dictate 1.4 speech to text translation, voice recognition software was used. The training of the voice recognition software took about ten minutes to reach an acceptable degree of accuracy.

While reading the students work, comments were provided by referring specifically to passages within the text. When the paper was finished, a summary of my thoughts about the paper with its strengths and weaknesses was provided, along with a grade for the work.

Thirty three students from two modules had feedback provided in this manner. For the 17 First Year students’ work, the process was timed in order to provide a comparison. The First Year scripts took (on average) 12.5 minutes to provide the feedback, explanations, and grade, and then print and staple the feedback to the back of the students’ work. The text averaged about 1.5 pages of single spaced text broken up into about 30 one to three line paragraphs. The marking of these first year reports had previously taken between 15 and 20 minutes, giving an average time savings of about five minutes each. It was estimated that similar time savings were realised with the 16 Second Year scripts.


The 33 students who were a part of the initial trial were asked to comment on the feedback they received with their work (handwritten text in the body of the work, or printed text at the end). Fifteen of the students responded, with six commenting that the type of feedback they received didn’t matter to them; four reporting that they preferred comments interspersed within the text on the paper; and five reporting that they liked the depth and quantity of the feedback printed and stapled to the back of their work. Two students commented on the misinterpreting of the audio to text by the programme, with some of the comments being fairly humorous.

Overall, there were no strong feelings expressed either way (somewhat disappointing). It is assumed that the 18 students who didn’t respond had no preference for the form of their feedback (or didn’t look at their feedback). The overall interpretation is that the method is effective, but not one that has elicited any strong feelings either for or against from the students.


Using text to speech software to provide feedback on student work has been successful in this pilot. Students did not express strong preferences for the use of the system, but they did not express any strong dislike for the system either. However, significant time savings were realised by the staff member using it. In addition, the feedback was both quantitatively and qualitatively enriched.

From the results of this study, audio to text feedback on student work takes advantages of some the numerous benefits outlined in previous studies on providing audio feedback. From the students’ perspective, this includes the depth and quantity of the feedback. From the lecturers’ perspective, the positive experiences include providing more depth to the students’ feedback, and being able to fully explain how a comment might improve a piece of work, while enjoying considerable time savings.

There is a real loss in the audio to text if the personalised nature of the audio feedback is considered. The nuances and warmth of voice that the students reported to enjoy, is not available in the textual translation (Ice et al, 2007; Rotheram, 2009).

The drawbacks that have been identified with the provision of audio recordings are almost all associated with the administration of the sound files. The simplicity of the audio to text translation eliminates these administrative problems.

This simplicity is also scalable, whereas the advantages found with the use of audio files are not enough to offset the higher administrative burden if the system is scaled up. Providing audio recordings for 20 students may be worthwhile, but when the number of students increase, the cost is too high. This is not the case with audio to test translation. Because of the time savings, as the number of students increase, the value of the system increases. Saving an average of five minutes each when providing detailed feedback for 20 students is dwarfed by the time savings if similar gains are realised when providing feedback for 100 or more students.

Using the audio to text translation for providing detailed feedback on students work has the potential to change the nature of marking. Taking advantage of most of the benefits of using audio recordings, while minimising the drawbacks associated with their administration means that lecturers can enjoy the best of both worlds in their marking.


Attwood, R. (2009, March 5). Institutions hear consumers when students speak. Times Higher Education, 1886, 10.

Ice, P., Curtis, R., & Wells, J. (2007). Using asynchronous audio feedback to enhance teaching presence and students’ sense of community. Journal of Asynchronous Learning Networks, 11(2), 3 – 25.

King, D., McGugan, S., & Bunyan, N. (2008). Does it make a difference? Replacing text with audio feedback. Practice and Evidence of the Scholarship of Teaching and Learning in Higher Education, 3(2), 145 – 163.

Merry, S., Orsmond, P. (2008). Students’ attitudes and usage of academic feedback provided via audio files. Bioscience Education e- Journal, 11(3). Doi: 10.3108/beej.11.3

Rotheram, B. (2009). Sounds Good: Quicker, better assessment using audio feedback: Final Report.


One thought on “From Text to Audio and Back Again: Providing Students with Good Feedback

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s