Application of Regression Analysis on Text-Mining Data Associated with Autism Spectrum Disorder from Twitter: A Pilot Study

Chen Mo, Jingjing Yin, Isaac Chun-Hai Fung, Zion Tsz Ho Tse

Research output: Contribution to conferencePresentation

Abstract

Social media has become a popular resource of health data analysis. Mathematics and computation techniques are challenging to public health practitioners when using the massive data from social media. Besides, it is difficult to interpret results from traditional machine learning techniques. This study proposes a simple new solution by regressing the primary outcome of interest (e.g., number of retweets of a tweet or whether a tweet contains certain keywords) on the frequency of common terms appeared in the tweet. This method reduces the term matrix based on the fitted regression scores, such as relative risk or odds ratio. It also solves the data sparsity issue and transforms text data into continuous summary scores. It would be easier to perform data analysis on social media data and interpret the results using the proposed scores. We used a twitter data of Autism Spectrum Disorder (ASD) and applied regression models for analysis, including poisson model, hurdle model and logistic model with model selection based on the Youden index. We found that the terms with significant results are generally present the key factors associated with ASD in the existing literature.

Original languageAmerican English
StatePublished - Mar 26 2018
EventEastern North American Region International Biometric Society (ENAR) -
Duration: Mar 25 2018 → …

Conference

ConferenceEastern North American Region International Biometric Society (ENAR)
Period03/25/18 → …

Keywords

  • Austism Spectrum Disorder
  • Text-Mining Data
  • Twitter

DC Disciplines

  • Biostatistics
  • Public Health

Fingerprint

Dive into the research topics of 'Application of Regression Analysis on Text-Mining Data Associated with Autism Spectrum Disorder from Twitter: A Pilot Study'. Together they form a unique fingerprint.

Cite this