TTV: Towards advancing text-to-video generation with generative AI models and a comprehensive study of model fidelity, performance, and human perception

Research output: Contribution to journalArticlepeer-review

Abstract

Text-to-video generation has rapidly evolved as a groundbreaking application of generative AI, with the potential to revolutionize both creative and industrial sectors. Despite these advancements, the fidelity, performance, and real-world applicability of current models remain inadequately explored. This research aims to address this gap by evaluating the performance of three cutting-edge text-to-video models: Runway Gen2, CogVideoX-2B, and CogVideoX-5B. The primary objectives of this study are to (1) conduct a comprehensive evaluation of these models using rigorous mathematical assessments such as Frechet Inception Distance (FID), Frechet Video Distance (FVD), and CLIPScore to measure video quality, realism, and alignment with text input; (2) gather human perceptual data to assess perceived realism, quality, and accuracy; and (3) compare the models to identify strengths, weaknesses, and areas for improvement. To uncover how AI-generated videos measure up to human expectations, this study asked 60 participants to rate outputs from three leading text-to-video models using a 7-point Likert scale, 10 diverse prompts, and 10 real-world benchmarks. While CogVideoX-2B impressed with its precision and alignment, CogVideoX-5B stood out for its striking realism in the eyes of human viewers. These findings reveal a compelling trade-off between technical accuracy and perceptual appeal which highlights the need for evaluation methods that balance both.

Original languageEnglish
Pages (from-to)377-393
Number of pages17
JournalIssues in Information Systems
Volume26
Issue number1
DOIs
StatePublished - Jan 1 2025

Scopus Subject Areas

  • General Business, Management and Accounting

Keywords

  • CogVideoX
  • CogVideoX-2B
  • CogVideoX-5B
  • Generative AI
  • Runway Gen-2
  • TTV
  • text-to-video generation
  • text-to-video generative models
  • transformer models

Fingerprint

Dive into the research topics of 'TTV: Towards advancing text-to-video generation with generative AI models and a comprehensive study of model fidelity, performance, and human perception'. Together they form a unique fingerprint.

Cite this