Moving educational assessment from paper to online is much more than a digitisation project. It is an opportunity to transform the relationship between educators and test-takers. Did you miss our Q&A Coffee Break about the pedagogical impact of online assessment? Don’t worry. In this article the panellists answer 8 questions on the topic.

The panellists:

  • Damion Young, Learning Technologist at the University of Oxford, United Kingdom
  • Magnus Widqvist, Administrator at Uppsala University, Sweden
  • Niels Goet, Data Scientist, Inspera, Norway

Q1: Although a very good thing to rethink the questions you use, why did moving to digital exams require/lead to this change?

Damion Young: As well as the possibility of students cheating in the exam itself, another concern with remote exams is the exposure of the item bank in situations where we have less control over students recording questions. We have had to remove all questions reused from a national medical assessment bank, which has meant some fairly significant rewrites in certain cases. We are also planning to decrease question reuse rates going forwards, on the assumption that our banks are likely to have been compromised. That said, there are orchestrated attempts to reconstruct face-to-face assessments, where students are asked to remember a certain number of questions each.

Q2: What are your experiences with psychometrics, not only for multiple-choice questions (MCQ) or constrained answer questions?

Niels Goet: In principle, psychometric analyses are not limited to auto-marked questions alone (e.g. MCQ, multiple-response, drag-and-drop). Open-ended questions can also be marked. There is, however, a mathematical limit to how easily we can use traditional psychometric models (such as IRT) to evaluate open-ended questions. If open-ended questions have a clear correct or wrong answer, we can model them like any other “constrained answer question”.

This is much more challenging when a student’s “score” on an open-ended question is judged by a wide range of criteria and the score can vary in a wide range. The challenges here are two-fold. First, there are many potential scoring categories. If students can get a mark ranging from 0 to 10 on an open-ended question, you would need sufficient response data in each category to estimate a model. Often, these questions are multi-dimensional in nature too since they typically evaluate multiple skills. This means you would need a lot of data to get any meaningful estimates. The second challenge is that open-ended questions are more likely (but not necessarily so) to be affected by marker bias, simply because they are not auto-marked. This can be compensated for using, for example, a grader-compensatory psychometric model.

Q3: How has moving tests online affected the test construct and specifications for you? And if test constructs are impacted has that had a pedagogical impact?

Magnus Widqvist: Some test authors have been very careful in their approach and tried not to deviate from their usual modus when constructing tests. Others have embraced the wide array of tools that the digital exam offers. Either method has its pros and cons: We have (anecdotal) testimonies that struggling students are having difficulties, probably due to the stress of having to learn a whole new way of studying and taking exams. This is in no small part due to the fact that the move to digital had to be rapidly implemented due to COVID-19. The adjustment period was extremely short for us all.

Q4: Talking about adaptation, has COVID-19 affected the length of your exams? Is there an ideal length in minutes of Inspera exams?

Damion Young: For large-scale open-book remote, handwritten exams, we have allowed extra time for potential technical problems. For our closed-book, remote Inspera assessments, we have not allowed extra exam time, just a longer lead-in for checking IDs, cameras, etc. The few exams we have that last longer than 2 hours have a minute screen break, where we pause the assessment and ask students to stand up, stretch and look away from their computers for a while.

Niels Goet: In terms of an “ideal length of exams”: adaptivity is designed to improve the measurement properties of an assessment. The main advantage of an adaptive test is that the questions are tailored to the student’s ability level. This means that we gather more “information” about the student’s ability with each question, and therefore can use shorter tests. The “ideal length” in this case depends on how well the questions that you have are suited for the target group (i.e. if you have more questions that are at the right level for your students, you can expect to be able to run shorter tests).

Q5: What guides the teacher/lecturer in the selection of test questions? Do they use the purpose of the test, learning outcomes?

Damion Young: In medicine, in particular, there have been increasing efforts to map questions to the syllabus to ensure good coverage. However, in some subjects, examiners are asked to write questions in groups of five questions on a single subject. This has helped discourage a scattergun, surface appraisal of students’ knowledge which is a potential side effect of item banking and is something we perhaps need to think about as we move into more automated test creation.

Q6: As we move to assessment online, how do we manage academic integrity? Your views on prevention of impersonation, online proctoring both efficacy and ethics, and plagiarism would be helpful.

Magnus Widqvist: This is a can of worms, indeed. As we are in the process of determining the practical and judicial boundaries here, I cannot give a good answer. But we run all the exams through Urkund as a plagiarism check.

Damion Young: Built-in security provided by Inspera: detecting large amounts of text suddenly pasted in; SEB means can’t do anything else on the computer. For invigilation, we ask students to allow us to watch them via Zoom from the side (so we can see hands and face, not screen), and we are investigating Inspera’s Remote Exam, which would give us continued face and screen recording in the event of an internet outage. One thing to mention on academic integrity is that one of the drivers for the introduction of computer-marked questions as a replacement for hand-marked, short-answer questions, is the objectivity of marking – it’s all too easy for human markers to be influenced by mood, hand-writing, etc.

Niels Goet: I think this is a really important question. A lot of companies have gone to market with online proctoring tools lately, and many use machine learning to identify “anomalous behaviour” based on video material. Similarly, plagiarism tools are based on identifying statistical anomalies (here, in the degree of overlap between the answer/text that a student has submitted and pre-existing content). From a sole data science perspective, and to manage the balance between efficacy and ethics, I would say it’s important that we are transparent about what these tools can and cannot do. Online proctoring and plagiarism tools are designed to detect anomalies, not do judge whether such malpractice has actually occurred. I think it’s important that we are open about that fact and to recognise that there always has to be human intervention to make the call whether malpractice has actually taken place or not. These tools should be complementary to, but never replace, processes for maintaining academic integrity.

Q7: How can online courses and assessment affect students’ thinking and their learning efficiency? What role does technology play in that, and its accessibility?

Damion Young: One dream, of course, is easier delivery of the sort of adaptive learning that you would be able to do easily with a one to one tutor – matching teaching exactly to the student’s needs – but mediated by some sort of question/test and a person, or even totally managed by the computer. Of course, online courses and assessment make it much easier to learn part-time around a job and, as we’ve seen, to carry on education during lockdown! However, generating a spirit of involvement, that makes students want to excel, is much harder – at its most extreme, look at MOOCs, where drop-out rates are so high… at least in part because of lack of human contact, whether face to face or virtual, peer-to-peer or staff-student.

Magnus Widqvist: This is a bit speculative, but as I see it, as the exams and the types of questions move more towards questions of “Explain how”, Discuss”, and ”Analyse”; and away from “Who”, When”, and “Where”, this also moves the students towards a different mode of studying and thinking. The pure fact-cramming gives place to learning methods of fact-gathering and compilation. Here technology plays a vital role in making facts and research accessible at the student’s fingertips. It will also make it important for the student to be able to analyse the sources with a critical mindset.

Q8: What can you do to enhance the positive pedagogical impact from your particular role?

Magnus Widqvist: I try to set a good example. I make myself available, and try to impart my own (rather nerdy) enthusiasm for the toolbox that is Inspera. I find it rewarding to interact with, and brainstorm methods and unconventional solutions on how to present the exams. If something is presented as fun, rather than a chore, it will be easier to both teach and learn. And I make it a point that I don’t know everything, but I will always try to find out.

Damion Young: Making sure that examiners have the tools – item banks, understanding of what the data mean, etc. – to make it easy for them to put together fair assessments. In particular, when it comes to question analysis, asking examiners to remember that performance – particularly with University-size groups as opposed to national exams – can also be about the teaching/students’ understanding, rather than the question itself!

Do you want to learn more?

Follow this link to watch a full recording of the Q&A Coffee Break held on 20 October 2020. Enjoy!