In the wake of NTNU's merger with HiG, HiÅ and HiST, the university's course catalogue has expanded drastically. The various institutions will naturally have courses with overlapping curriculum, and in order to organize and improve the course catalogue of the new NTNU, identifying these overlapping courses is of interest. This specialization project aims to automate the process of identifying such courses through the use of Natural Language Processing (NLP). By using the courses' description from the catalogue, it should be possible to gain an understanding of their content and compare them to each other. The problem will be approached using an ensemble of comparators, and will research how combining classical information retrieval techniques with advanced NLP methods can improve the comparison.
Keywords: Natural Language Processing, Information Retrieval, Keyword Extraction
You can follow the implementation on GitHub. The programming is done in Python, and utilizes the Natural Language Toolkit. Course descriptions are automatically downloaded through IME's Data API.