Generative AI tools to propose summative assessment tasks

Guttorm Sindre; Line Kolås; Robin Isfold Munkvold; Mariusz Nowostawski; Håkon Einar Stokke Ringen; Joakim Pettersen Vassbakk

doi:10.5324/zy7yn618

Authors

Guttorm Sindre Department of Computer Science, NTNU https://orcid.org/0000-0001-5739-8265
Line Kolås Faculty of Social Sciences, Nord University https://orcid.org/0000-0003-4908-3241
Robin Isfold Munkvold Faculty of Social Sciences, Nord University
Mariusz Nowostawski Department of Computer Science, NTNU https://orcid.org/0000-0002-2809-8615
Håkon Einar Stokke Ringen Department of Computer Science, NTNU
Joakim Pettersen Vassbakk Department of Computer Science, NTNU

DOI:

https://doi.org/10.5324/zy7yn618

Keywords:

Programming, Exam Questions, Generative AI, LLMs, Summative Assessments

Abstract

Generative artificial intelligence (GenAI) is increasingly considered for educational purposes, including the design of programming assessments. While prior work has focused on introductory-level courses, little is known about the suitability of GenAI for generating examination tasks at the master’s level. This study investigates the potential of ChatGPT 4.0, Bing Copilot, and Google Gemini, to generate programming exam questions and model solutions. Using a fixed prompt, four researchers collected responses monthly from July 2024 to March 2025. The resulting questions and solutions were analyzed through a mixed-methods approach, combining semantic similarity analysis with Sentence-BERT embeddings, automated code quality metrics, and manual evaluation of task appropriateness. Findings indicate substantial variation across systems: ChatGPT and Gemini generated a diverse set of tasks, including some suitable for advanced assessment, whereas Bing Copilot displayed strong topical conver-gence, with limited variation and several tasks more appropriate for undergradu-ate levels. While the vague prompt limited contextual alignment with intended learning outcomes, results show that GenAI can assist educators by providing a pool of candidate tasks and solutions, albeit requiring expert curation and refine-ment. Future work should examine richer prompting strategies, inclusion of course-specific learning objectives, and student-centered evaluations of gener-ated tasks.

Downloads

Download data is not yet available.

Generative AI tools to propose summative assessment tasks

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite