American Journal of Management Science and Engineering
Volume 4, Issue 3, May 2019, Pages: 49-55
Received: Apr. 7, 2019;
Accepted: Jun. 5, 2019;
Published: Jul. 26, 2019
Views 510 Downloads 93
Zhenhuan Sui, Department of Integrated Systems Engineering, The Ohio State University, Columbus, Ohio, USA
Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.
Social Media Text Data Visualization Modeling: A Timely Topic Score Technique, American Journal of Management Science and Engineering.
Vol. 4, No. 3,
2019, pp. 49-55.
Zaman, T. R., Herbrich, R., Van Gael, J., & Stern, D. (2010, December). Predicting information spreading in Twitter. In Workshop on computational social science and the wisdom of crowds, nips (Vol. 104, No. 45, pp. 17599-601). Citeseer.
Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360.
Yang, J., & Counts, S. (2010, May). Predicting the speed, scale, and range of information diffusion in Twitter. In Fourth International AAAI Conference on Weblogs and Social Media.
Shah, D., & Zaman, T. (2010). Community detection in networks: The leader-follower algorithm. stat, 1050, 2.
Zaman, T., Fox, E. B., & Bradlow, E. T. (2014). A bayesian approach for predicting the popularity of tweets. The Annals of Applied Statistics, 8 (3), 1583-1611.
Allen, T. T., & Xiong, H. (2012). Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews. Applied Stochastic Models in Business and Industry, 28 (2), 152-163.
Allen, T. T., Xiong, H., & Afful‐Dadzie, A. (2016). A directed topic model applied to call center improvement. Applied Stochastic Models in Business and Industry, 32 (1), 57-73.
Allen, T. T., Vinson, S. M., Raqab, A., & Allam, Y. (2013). Using SMERT to Identify Actionable Topics in Student Feedback. Integrated Systems Engineering Technical Report 2013.
Blei, D. M., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation Journal of Machine Learning Research (3).
Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712.
Feldman, R. and Sanger, J. (2007). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3): 130-137.
Teh, Y. W., Newman, D., & Welling, M. (2007). A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (pp. 1353-1360).
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1), 5228-5235.
Carpenter, B. (2010). Integrating out multinomial parameters in latent Dirichlet allocation and naive Bayes for collapsed Gibbs sampling. Rapport Technique, 4, 464.