A Text-Mining Framework for Supporting Systematic Reviews
American Journal of Information Management
Volume 1, Issue 1, November 2016, Pages: 1-9
Received: Jul. 21, 2016; Accepted: Aug. 3, 2016; Published: Aug. 31, 2016
Views 3055      Downloads 150
Authors
Dingcheng Li, Department of Health Sciences Research, Mayo Clinic, Rochester, USA;Watson Health Cloud, IBM, Rochester, USA
Zhen Wang, Department of Health Sciences Research, Mayo Clinic, Rochester, USA;Robert D. and Patricia E. Kern Centre for the Science of Health Care Delivery, Mayo Clinic, Rochester, USA
Liwei Wang, Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Sunghwan Sohn, Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Feichen Shen, Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Mohammad Hassan Murad, Robert D. and Patricia E. Kern Centre for the Science of Health Care Delivery, Mayo Clinic, Rochester, USA;Division of Preventive Medicine, Mayo Clinic, Rochester, USA
Hongfang Liu, Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Article Tools
Follow on us
Abstract
Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
Keywords
Systematic Review, Text Mining, Topic Modeling, Keyword Relevance, Indexed-Term Relevance, Topic Relevance, Data Mining
To cite this article
Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, Mohammad Hassan Murad, Hongfang Liu, A Text-Mining Framework for Supporting Systematic Reviews, American Journal of Information Management. Vol. 1, No. 1, 2016, pp. 1-9. doi: 10.11648/j.infomgmt.20160101.11
Copyright
Copyright © 2016 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[1]
Serra R, Rizzuto A, Rossi A, Perri P, Barbetta A, Abdalla K, Caroleo S, Longo C, Amantea B, Sammarco G: Skin grafting for the treatment of chronic leg ulcers–a systematic review in evidence‐based medicine. International wound journal 2016.
[2]
Mulrow CD: Systematic reviews: rationale for systematic reviews. Bmj 1994, 309 (6954): 597-599.
[3]
Teagarden JR: Meta‐Analysis: Whither Narrative Review? Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy 1989, 9 (5): 274-284.
[4]
Li D-C, Liu H, Chute CG, Jonnalagadda SR: Towards Assigning References Using Semantic, Journal and Citation Relevance. In: International Conference on Biomedical Informatics and Biomedicine. Shanghai, China; 2013.
[5]
Uman LS, Chambers CT, McGrath PJ, Kisely S: Psychological interventions for needle-related procedural pain and distress in children and adolescents. Cochrane Database Syst Rev 2006, 4.
[6]
Higgins JP, Green S: Cochrane handbook for systematic reviews of interventions, vol. 5: Wiley Online Library; 2008.
[7]
Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux P, Prasad K, Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R: How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA 2014, 312 (2): 171-179.
[8]
Allen IE, Olkin I: Estimating time to conduct a meta-analysis from number of citations retrieved. Jama 1999, 282 (7): 634-635.
[9]
RxNorm [http://www.nlm.nih.gov/research/umls/rxnorm/]
[10]
Wang Z, Noor A, Elraiyah T, Murad M: Dual monitors to increase efficiency of conducting systematic reviews. In: 21st Cochrane Colloquium: Sep 19-23 2013; Quebec, Canada: Cochrane Collaboration; 2013.
[11]
Savoie I, Helmer D, Green CJ, Kazanjian A: Beyond Medline. International Journal of Technology Assessment in Health Care 2003, 19 (01): 168-178.
[12]
Blei DM, Ng AY, Jordan MI: Latent Dirichlet allocation. Journal of Machine Learning Research 2003, 3: 993-1022.
[13]
Shu L, Long B, Meng W: A latent topic model for complete entity resolution. In: Data Engineering, 2009 ICDE'09 IEEE 25th International Conference on: 2009: IEEE; 2009: 880-891.
[14]
Wang X, Mohanty N, McCallum A: Group and topic discovery from relations and text. In: 2005: ACM; 2005: 28-35.
[15]
Wang C, Blei D, Li F-F: Simultaneous image classification and annotation. In: Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on: 2009: IEEE; 2009: 1903-1910.
[16]
Wang X, Ma X, Grimson E: Unsupervised activity perception by hierarchical bayesian models. In: Computer Vision and Pattern Recognition, 2007 CVPR'07 IEEE Conference on: 2007: IEEE; 2007: 1-8.
[17]
Wang H, Ding Y, Tang J, Dong X, He B, Qiu J, Wild DJ: Finding complex biological relationships in recent PubMed articles using Bio-LDA. PLoS One 2011, 6 (3): e17243.
[18]
Liu B, Liu L, Tsykin A, Goodall GJ, Green JE, Zhu M, Kim CH, Li J: Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics 2010, 26 (24): 3105-3111.
[19]
Li D, N Xia, S Sohn, KB Cohen, CG Chute, H Liu: Incorporating Topic Modeling Features For Clinic Concept Assertion Classification. In: The 5th International Symposium on Languages in Biology and Medicine (LBM 2013) 12th and 13th Decemeber, 2013 2013; Tokyo, Japan; 2013.
[20]
Ogilvie MM, Tearne CF: Spontaneous abortion after hand-foot-and-mouth disease caused by Coxsackie virus A16. British medical journal 1980, 281 (6254): 1527.
[21]
Liu H, Wang T, Wei Y, Zhao G, Su J, Wu Q, Qiao H, Zhang Y: Detection of type 2 diabetes related modules and genes based on epigenetic networks. BMC Syst Biol 2014, 8 Suppl 1: S5.
[22]
Li D, T Thermeau, CG Chute, H Liu: Discovering Associations Among Diagnosis Groups Using Topic Modeling. In: AMIA Summits on Translational Science Proceedings: 2013; San Francisco; 2013.
[23]
Alison O, Thomas J, McNaught J, Miwa M, Ananiadou S: Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews 2015, 4 (1): 5.
[24]
Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF: Text categorization models for high-quality article retrieval in internal medicine. Journal of the American Medical Informatics Association 2005, 12 (2): 207-216.
[25]
Cohen AM: Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@ 95 measure. Journal of the American Medical Informatics Association 2011, 18 (1): 104-104.
[26]
Cohen AM, Hersh WR, Peterson K, Yen P-Y: Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association 2006, 13 (2): 206-219.
[27]
Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR: Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association 2012: amiajnl-2011-000784.
[28]
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH: Semi-automated screening of biomedical citations for systematic reviews. BMC bioinformatics 2010, 11 (1): 55.
[29]
Bekhuis T, Demner-Fushman D: Towards automating the initial screening phase of a systematic review. Stud Health Technol Inform 2010, 160 (Pt 1): 146-150.
[30]
Bekhuis T, Demner-Fushman D: Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers. Artificial intelligence in medicine 2012, 55 (3): 197-207.
[31]
Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S: Reducing systematic review workload through certainty-based screening. Journal of biomedical informatics 2014, 51: 242-253.
[32]
Jonnalagadda S, Petitti D: A new iterative method to reduce workload in systematic review process. International Journal of Computational Biology and Drug Design 2013, 6 (1): 5-17.
[33]
Lee E, Dobbins M, DeCorby K, McRae L, Tirilis D, Husson H: An optimal search filter for retrieving systematic reviews and meta-analyses. BMC medical research methodology 2012, 12 (1): 1.
[34]
Shirahatti A: Text Retrieval for Systematic Reviews.
[35]
Bekhuis T, Tseytlin E, Mitchell KJ, Demner-Fushman D: Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence. PloS one 2014, 9 (1): e86277.
[36]
Hatcher E, Gospodnetic O, McCandless M: Lucene in action. In.: Manning Publications; 2004.
[37]
Pérez-Iglesias J, Pérez-Agüera JR, Fresno V, Feinstein YZ: Integrating the probabilistic models BM25/BM25F into Lucene. arXiv preprint arXiv:09115046 2009.
[38]
Lowe HJ, Barnett GO: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama 1994, 271 (14): 1103-1108.
[39]
McCallum AK: {MALLET: A Machine Learning for Language Toolkit}. 2002.
[40]
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence: 2004: AUAI Press; 2004: 487-494.
[41]
Clement S, Lassman F, Barley E, Evans‐Lacko S, Williams P, Yamaguchi S, Slade M, Rüsch N, Thornicroft G: Mass media interventions for reducing mental health‐related stigma. The Cochrane Library 2013.
[42]
Petersen SH, Harling H, Kirkeby LT, Wille‐Jørgensen P, Mocellin S: Postoperative adjuvant chemotherapy in rectal cancer operated for cure. The Cochrane Library 2012.
[43]
Jefferson T, Rivetti A, Harnden A, Di Pietrantonj C, Demicheli V: Vaccines for preventing influenza in healthy children. The Cochrane Library 2008.
[44]
DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials 1986, 7 (3): 177-188.
[45]
Altman DG, Bland JM: Interaction revisited: the difference between two estimates. Bmj 2003, 326 (7382):219.
[46]
Li D, Z Wang, F Shen, MH Murad, H Liu: Reducing the Screening Burden of Systematic Review with a Multiple-level Relevance Ranking System. In: American Medical Informatics Association: 2014; Washington, DC; 2014.
[47]
Woods D, Trewheellar K: Medline and Embase complement each other in literature searches. Bmj 1998, 316 (7138): 1166.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186