Abstract
Multimodal video analysis is a complex and time-consuming process for a researcher; it entails capturing, watching, and re-watching video data to identify which segments best inform or address the questions that drive the research. Modern AI applications can alleviate the challenges that arise during the fine-grained analysis of learners' multimodal interactions captured through video. In this study, we present a supervised approach to training a deep neural network to analyze children's computational thinking (CT) captured through multimodal video data. The approach first uses a set of images extracted from video data to train the AI to map them to labels generated using a priori theory. Confusion matrices were used to establish the performance of the AI by comparing AI predictions to human analysis on a validation set of data. The findings suggested that the AI classified several aspects of children's CT in a way that was highly consistent with human analysis, demonstrating how the AI could serve as an additional team member during multimodal analysis. Implications for using AI to ease the challenges of multimodal analysis of video data are discussed.
Original language | English |
---|---|
Article number | 100146 |
Journal | Computers and Education: Artificial Intelligence |
Volume | 5 |
DOIs | |
State | Published - Jan 2023 |
Keywords
- Artificial Intelligence (AI)
- Computational thinking (CT)
- Embodied interactions
- Machine learning
- Multimodality