Abstract
Text-to-video generation has rapidly evolved as a groundbreaking application of generative AI, with the potential to revolutionize both creative and industrial sectors. Despite these advancements, the fidelity, performance, and real-world applicability of current models remain inadequately explored. This research aims to address this gap by evaluating the performance of three cutting-edge text-to-video models: Runway Gen2, CogVideoX-2B, and CogVideoX-5B. The primary objectives of this study are to (1) conduct a comprehensive evaluation of these models using rigorous mathematical assessments such as Frechet Inception Distance (FID), Frechet Video Distance (FVD), and CLIPScore to measure video quality, realism, and alignment with text input; (2) gather human perceptual data to assess perceived realism, quality, and accuracy; and (3) compare the models to identify strengths, weaknesses, and areas for improvement. To uncover how AI-generated videos measure up to human expectations, this study asked 60 participants to rate outputs from three leading text-to-video models using a 7-point Likert scale, 10 diverse prompts, and 10 real-world benchmarks. While CogVideoX-2B impressed with its precision and alignment, CogVideoX-5B stood out for its striking realism in the eyes of human viewers. These findings reveal a compelling trade-off between technical accuracy and perceptual appeal which highlights the need for evaluation methods that balance both.
| Original language | English |
|---|---|
| Pages (from-to) | 377-393 |
| Number of pages | 17 |
| Journal | Issues in Information Systems |
| Volume | 26 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 1 2025 |
Scopus Subject Areas
- General Business, Management and Accounting
Keywords
- CogVideoX
- CogVideoX-2B
- CogVideoX-5B
- Generative AI
- Runway Gen-2
- TTV
- text-to-video generation
- text-to-video generative models
- transformer models
Fingerprint
Dive into the research topics of 'TTV: Towards advancing text-to-video generation with generative AI models and a comprehensive study of model fidelity, performance, and human perception'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver