Session Goals
- To define the basic framework of a topic model, such as its inputs, outputs, and underlying logic
- To apply knowledge of topic models in using RapidMiner generate topics on a data corpus, evaluate topic quality, and extract top-scoring words and texts
Key Ideas
- Topic models view an individual instance of text as a ‘bag of words’, or an unsequenced collection of tokens without inherent meaning. A corpus of these texts is viewed to be a mixture of ‘topics’, which themselves refer to probably distributions of words. From this perspective, each text is seen as having its bag of words drawn from its topic’s word distribution. Through this simplistic model of language, a topic model seeks to discover (a) how words are distributed within each topic, and (b) what topics constitute each individual text in a corpus.
- While topic modelling proceeds in an automated fashion, several manual decisions need to be made. The optimal number of topics needs to be provided by an analyst, and this may be derived by a combination of quantitative and qualitative assessments of topic quality. The ‘meaning’ of a topic is also not discerned without human interpretation.
Homework and Activity
- (Model Selection and Evaluation) Load your chosen dataset into RapidMiner. Run the topic modelling procedure for 2-20 topics. Evaluate topic coherence and choose the ideal number of topics.
- (Model Parameters) For each topic, produce the following: (a) a list of the top 30 highest-scoring words and (b) the top 30 highest-scoring texts. Report the score of each word and each text on these lists.
Readings
- Maier, D., Waldherr, A., Miltner, P., Widemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. https://doi.org/10.1080/19312458.2018.1430754
MaierETAL_CMMApplyingLDA.pdf
Slides
2022-03-18 ADL RapidMiner Hands-On.pdf
2022-04-01 ADL LDA Homework Sample.docx
Session Recording: https://drive.google.com/drive/folders/1OXwc5a2SrQoO4PMU2QI5I6dmW71l9sfJ?usp=sharing