Generative AI tools to propose summative assessment tasks
DOI:
https://doi.org/10.5324/zy7yn618Keywords:
Programming, Exam Questions, Generative AI, LLMs, Summative AssessmentsAbstract
Generative artificial intelligence (GenAI) is increasingly considered for educational purposes, including the design of programming assessments. While prior work has focused on introductory-level courses, little is known about the suitability of GenAI for generating examination tasks at the master’s level. This study investigates the potential of ChatGPT 4.0, Bing Copilot, and Google Gemini, to generate programming exam questions and model solutions. Using a fixed prompt, four researchers collected responses monthly from July 2024 to March 2025. The resulting questions and solutions were analyzed through a mixed-methods approach, combining semantic similarity analysis with Sentence-BERT embeddings, automated code quality metrics, and manual evaluation of task appropriateness. Findings indicate substantial variation across systems: ChatGPT and Gemini generated a diverse set of tasks, including some suitable for advanced assessment, whereas Bing Copilot displayed strong topical conver-gence, with limited variation and several tasks more appropriate for undergradu-ate levels. While the vague prompt limited contextual alignment with intended learning outcomes, results show that GenAI can assist educators by providing a pool of candidate tasks and solutions, albeit requiring expert curation and refine-ment. Future work should examine richer prompting strategies, inclusion of course-specific learning objectives, and student-centered evaluations of gener-ated tasks.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Guttorm Sindre, Line Kolås, Robin Isfold Munkvold, Mariusz Nowostawski, Håkon Einar Stokke Ringen, Joakim Pettersen Vassbakk

This work is licensed under a Creative Commons Attribution 4.0 International License.