Social Media Data Extraction Method Benchmarking Comparison
International Journal on Data Science and Technology
Volume 5, Issue 2, June 2019, Pages: 40-44
Received: Apr. 7, 2019; Accepted: Aug. 13, 2019; Published: Aug. 28, 2019
Views 97      Downloads 19
Author
Zhenhuan Sui, Department of Integrated Systems Engineering, The Ohio State University, Columbus, USA
Article Tools
Follow on us
Abstract
Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
Keywords
Natural Language Processing, Text Analytics, Twitter Analysis, Social Media, Software Analysis, Big Data Analysis
To cite this article
Zhenhuan Sui, Social Media Data Extraction Method Benchmarking Comparison, International Journal on Data Science and Technology. Vol. 5, No. 2, 2019, pp. 40-44. doi: 10.11648/j.ijdst.20190502.12
Copyright
Copyright © 2019 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[1]
Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360.
[2]
Russell, M. A. & Russell, M. (2011). 21 Recipes for Mining Twitter. O'Reilly Media, Inc.
[3]
Moujahid, A. (2015) An Introduction to Text Mining Using Twitter Streaming API and Python. Data Analytics and More. N. p., n. d. Web. 04 May.
[4]
Zaman, T. R., Herbrich, R., Gael, J. V., & Stern, D. (2010) Predicting information spreading in Twitter. Workshop on computational social science and the wisdom of crowds, nips 104 (45), 17599-601.
[5]
Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712.
[6]
Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3), 130-137.
[7]
Sui, Z. (2019). Social Media Text Data Visualization Modeling: A Timely Topic Score Technique, American Journal of Management Science and Engineering. 4 (3), 49-55. doi: 10.11648/j. ajmse.20190403. 12.
[8]
Wang, Y., & Liu, H. (2013) Advances in the Machine Learning Methods, Wireless Internet Technology, 7, 89-90.
[9]
Zhan, P. (2014) Talking about the Machine Learning Method, Network Security Technology and Application, 1, 145-146.
[10]
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20 (3), 273-297.
[11]
Li, B., Cong, Y., Tian, Z., & Xue, Y. (2014) Prediction and virtual screening of the selective inhibitors of MMP-13 to MMP-1 based on the molecular descriptors and the machine learning methods, Acta Physico-Chimica Sinica, 1, 136-137.
[12]
Zha, Y., Sun, C., & Wang, K. (2015) Research on the Tax Loss of the Real Estate Industry Based on the Micro-data -- Empirical Analysis Based on the Machine Learning Method, China's Prices, 9, 109-110.
[13]
Sun, C., & Wang, C. (2015) Application of the Machine Learning in the Credit Risk Prediction and Recognition, China's Prices, 12, 101-102.
[14]
Twitter Analytics 2015. “Twitter Analytics”. https://analytics.twitter.com/about, N. p., n. d. Web. 05 May.
[15]
Followthehashtag 2015. “Followthehashtag // Twitter Keyword Search Analytics, Influence, Geo Content Analysis Tool, and Much More.” https://www.followthehashtag.com/, N. p., n. d. Web. 04 May.
[16]
Tweepy 2015. “Tweepy”. http://www.tweepy.org/, N. p., n. d. Web. 05 May.
[17]
Next Analytics 2015. “Next Analytics”. https://www.nextanalytics.com/, N. p., n. d. Web. 05 May.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186