Abstract
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models’ language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated scoring was applied to five science items. Results indicate that including item descriptions prior to student responses provides additional contextual information to the transformer model, allowing it to generate automated scoring models with improved performance. These automated scoring models achieved scoring accuracy comparable to human raters. However, they struggle to evaluate responses that contain complex scientific terminology and to interpret responses that contain unusual symbols, atypical language errors, or logical inconsistencies. These findings underscore the importance of the efforts from both researchers and teachers in advancing the accuracy, fairness, and effectiveness of automated scoring.
| Original language | English |
|---|---|
| Pages (from-to) | 25-37 |
| Number of pages | 13 |
| Journal | Educational Measurement: Issues and Practice |
| Volume | 44 |
| Issue number | 3 |
| DOIs | |
| State | Published - Aug 13 2025 |
| Externally published | Yes |
Scopus Subject Areas
- Education
Keywords
- automated scoring
- constructed responses
- learning progression