Generative AI tools to propose summative assessment tasks

Authors

DOI:

https://doi.org/10.5324/zy7yn618

Keywords:

Programming, Exam Questions, Generative AI, LLMs, Summative Assessments

Abstract

Generative artificial intelligence (GenAI) is increasingly considered for educational purposes, including the design of programming assessments. While prior work has focused on introductory-level courses, little is known about the suitability of GenAI for generating examination tasks at the master’s level. This study investigates the potential of ChatGPT 4.0, Bing Copilot, and Google Gemini, to generate programming exam questions and model solutions. Using a fixed prompt, four researchers collected responses monthly from July 2024 to March 2025. The resulting questions and solutions were analyzed through a mixed-methods approach, combining semantic similarity analysis with Sentence-BERT embeddings, automated code quality metrics, and manual evaluation of task appropriateness. Findings indicate substantial variation across systems: ChatGPT and Gemini generated a diverse set of tasks, including some suitable for advanced assessment, whereas Bing Copilot displayed strong topical conver-gence, with limited variation and several tasks more appropriate for undergradu-ate levels. While the vague prompt limited contextual alignment with intended learning outcomes, results show that GenAI can assist educators by providing a pool of candidate tasks and solutions, albeit requiring expert curation and refine-ment. Future work should examine richer prompting strategies, inclusion of course-specific learning objectives, and student-centered evaluations of gener-ated tasks.

Downloads

Download data is not yet available.

Downloads

Published

2025-11-26

How to Cite

[1]
“Generative AI tools to propose summative assessment tasks”, NIKT, vol. 37, no. 4, Nov. 2025, doi: 10.5324/zy7yn618.