In recent years, with the advances in information communication, Sina Weibo has attracted the attention of scholars in China. The big data analytics platform at Sina Weibo has experienced tremendous growth over the past few years in terms of size, complexity, number of users and variety of use cases. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, Weibo network analysis and modeling become difficult. To analyze the Weibo data, the structure framework of Weibo need firstly be known, and the composition and characteristics of Weibo data must be understood. Then by comparing different application programming interface (API), the more efficient and convenient method of data collection are found. Moreover, according to the characteristics of Weibo data, quarrying the cleaning methods and strategies provide convenient for the further processing of data. Finally, the integration of big data mining and the properties of Weibo find the most effective method based on large Weibo data, and discuss the future research
Survey on Sina Weibo Research Based on Big Data Mining, International Journal of Data Science and Analysis.
Vol. 1, No. 1,
2015, pp. 1-7.
L. Manovich, “Trending: the promises and the challenges of big social data,” Debates in the digital humanities, pp. 460–475, 2011.
B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, 2012.
A. Ghose and P. G. Ipeirotis, “Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 10, pp. 1498–1512, 2011.
N. Kshetri, “The emerging role of big data in key development issues: Opportunities, challenges, and concerns,” Big Data & Society, vol. 1, no. 2, p. 2053951714564227, 2014.
S. Gonza´ lez-Bailo´ n, N. Wang, A. Rivero, J. Borge Holthoefer, and Y. Moreno, “Assessing the bias in samples of large online networks,” Social Networks, vol. 38, pp. 16–27, 2014.
S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: Issues and challenges moving forward,” in System Sciences (HICSS), 2013 46th Hawaii International Conference on. IEEE, 2013, pp. 995–1004.
S. Gonza´ lez-Bailo´ n, J. Borge-Holthoefer, and Y. Moreno, “Broad- casters and hidden influentials in online protest diffusion,” American Behavioral Scientist, p. 0002764213479371, 2013.
C. R. Shalizi, A. Rinaldo et al., “Consistency under sampling of exponential random graph models,” The Annals of Statistics, vol. 41, no. 2, pp. 508–535, 2013.
D. Boyd and K. Crawford, “Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon,” Information, communication & society, vol. 15, no. 5, pp. 662–679, 2012.
K. Nahon, J. Hemsley, R. M. Mason, S. Walker, and J. Eckert, “Information flows in events of political unrest,” 2013.
A. Bruns and J. E. Burgess, “The use of twitter hashtags in the formation of ad hoc publics,” 2011.
A. Bruns and J. Burgess, “Notes towards the scientific study of public communication on twitter,” Science and the Internet, pp. 159–169, 2012.
F. X. Diebold, “On the origin (s) and development of the term’big data’,” 2012.
S. M. Weiss and N. Indurkhya, Predictive data mining: a practical guide. Morgan Kaufmann, 1998.
F. X. Diebold, “big datadynamic factor models for macroeconomic measurement and forecasting,” in Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society,(edited by M. Dewatripont, LP Hansen and S. Turnovsky), 2003, pp. 115–122.
U. Fayyad, “Big data analytics: applications and opportunities in on-line predictive modeling,” in Keynote Talk. BigMine: BigData Mining Workshop KDD-2012, Beijing, China, 2012.
W. Fan and A. Bifet, “Mining big data: current status, and forecast to the future,” ACM sIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 1–5, 2013.
D. Feldman, M. Schmidt, and C. Sohler, “Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering,” in Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2013, pp. 1434–1453.
C. C. Aggarwal, Managing and mining sensor data. Springer Science & Business Media, 2013.
V. Gopalkrishnan, D. Steier, H. Lewis, and J. Guszcza, “Big data, big business: bridging the gap,” in Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. ACM, 2012, pp. 7–11.
K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon, “Parallel data processing with MapReduce: a survey,” ACM SIGMOD Record, vol. 40, no. 4, pp. 11–20, 2012.
J. Lin and A. Kolcz, “Large-scale machine learning at twitter,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012, pp. 793–804.
H. Chen, R. H. Chiang, and V. C. Storey, “Business intelligence and analytics: From big data to big impact.” MIS quarterly, vol. 36, no. 4, pp. 1165–1188, 2012.
C. D. Brummitt, R. M. DSouza, and E. Leicht, “Suppressing cascades of load in interdependent networks,” Proceedings of the National Academy of Sciences, vol. 109, no. 12, pp. E680–E689, 2012.
G.-J. Qi, C. C. Aggarwal, and T. Huang, “Community detection with edge content in social media networks,” in Data Engineering (ICDE), 2012 IEEE 28th International Conference on. IEEE, 2012, pp. 534–545.
S. Scellato, A. Noulas, and C. Mascolo, “Exploiting place features in link prediction on location-based social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011, pp. 1046–1054.
H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward scalable systems for big data analytics: A technology tutorial,” Access, IEEE, vol. 2, pp. 652–687, 2014.
L. M. Aiello, A. Barrat, R. Schifanella, C. Cattuto, B. Markines, and F. Menczer, “Friendship prediction and homophily in social media,” ACM Transactions on the Web (TWEB), vol. 6, no. 2, p. 9, 2012.
J. Yang and J. Leskovec, “Defining and evaluating network communities based on ground-truth,” Knowledge and Information Systems, vol. 42, no. 1, pp. 181–213, 2015.
J. D. Cruz, C. Bothorel, and F. Poulet, “Entropy based community detection in augmented social networks,” in Computational aspects of social networks (cason), 2011 international conference on. IEEE, 2011, pp. 163–168.
M. Allamanis, S. Scellato, and C. Mascolo, “Evolution of a location- based online social network: analysis and models,” in Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, 2012, pp. 145–158.
J. Yang and J. Leskovec, “Overlapping community detection at scale: a nonnegative matrix factorization approach,” in Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013, pp. 587–596.
A. Stefanidis, A. Crooks, and J. Radzikowski, “Harvesting ambient geospatial information from social media feeds,” GeoJournal, vol. 78, no. 2, pp. 319–338, 2013.
Y. Li, W. Chen, Y. Wang, and Z.-L. Zhang, “Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships,” in Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013, pp. 657–666.
A. Anagnostopoulos, L. Becchetti, C. Castillo, A. Gionis, and S. Leonardi, “Online team formation in social networks,” in Proceedings of the 21st international conference on World Wide Web. ACM, 2012, pp. 839–848.
C. C. Aggarwal and C. Zhai, “A survey of text classification algorithms,” in Mining text data. Springer, 2012, pp. 163–222.
N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar, and D. Song, “Evolution of social-attribute networks: measurements, modeling, and implications using google+,” in Proceedings of the 2012 ACM conference on Internet measurement conference. ACM, 2012, pp. 131–144.
I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. U. Khan, “The rise of big data on cloud computing: review and open research issues,” Information Systems, vol. 47, pp. 98–115, 2015.
K. Fujimoto and T. W. Valente, “Social network influences on adolescent substance use: Disentangling structural equivalence from cohesion,” Social Science & Medicine, vol. 74, no. 12, pp. 1952–1960, 2012.
M. Rabbath, P. Sandhaus, and S. Boll, “Multimedia retrieval in social networks for photo book creation,” in Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 2011, p. 72.
S. Shridhar, M. Lakhanpuria, A. Charak, A. Gupta, and S. Shridhar, “Snair: a framework for personalised recommendations based on social network analysis,” in Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks. ACM, 2012, pp. 55–61.
S. Maniu and B. Cautis, “Taagle: efficient, personalized search in collaborative tagging networks,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012, pp. 661–664.