Social Media Data Extraction Method Benchmarking Comparison
International Journal on Data Science and Technology
Volume 5, Issue 2, June 2019, Pages: 40-44
Received: Apr. 7, 2019; Accepted: Aug. 13, 2019; Published: Aug. 28, 2019
Zhenhuan Sui, Department of Integrated Systems Engineering, The Ohio State University, Columbus, USA
Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
Natural Language Processing, Text Analytics, Twitter Analysis, Social Media, Software Analysis, Big Data Analysis
Zhenhuan Sui, Social Media Data Extraction Method Benchmarking Comparison, International Journal on Data Science and Technology. Vol. 5, No. 2, 2019, pp. 40-44. doi: 10.11648/j.ijdst.20190502.12
