A Text-Mining Framework for Supporting Systematic Reviews

Dingcheng Li; Zhen Wang; Liwei Wang; Sunghwan Sohn; Feichen Shen; Mohammad Hassan Murad; Hongfang Liu

doi:doi:10.11648/j.infomgmt.20160101.11

| Peer-Reviewed

A Text-Mining Framework for Supporting Systematic Reviews

Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, Mohammad Hassan Murad, Hongfang Liu

Published in American Journal of Information Management (Volume 1, Issue 1)

Received: 21 July 2016 Accepted: 3 August 2016 Published: 31 August 2016

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.

Published in	American Journal of Information Management (Volume 1, Issue 1)
DOI	10.11648/j.infomgmt.20160101.11
Page(s)	1-9
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Systematic Review, Text Mining, Topic Modeling, Keyword Relevance, Indexed-Term Relevance, Topic Relevance, Data Mining

References

[1]	Serra R, Rizzuto A, Rossi A, Perri P, Barbetta A, Abdalla K, Caroleo S, Longo C, Amantea B, Sammarco G: Skin grafting for the treatment of chronic leg ulcers–a systematic review in evidence‐based medicine. International wound journal 2016.
[2]	Mulrow CD: Systematic reviews: rationale for systematic reviews. Bmj 1994, 309 (6954): 597-599.
[3]	Teagarden JR: Meta‐Analysis: Whither Narrative Review? Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy 1989, 9 (5): 274-284.
[4]	Li D-C, Liu H, Chute CG, Jonnalagadda SR: Towards Assigning References Using Semantic, Journal and Citation Relevance. In: International Conference on Biomedical Informatics and Biomedicine. Shanghai, China; 2013.
[5]	Uman LS, Chambers CT, McGrath PJ, Kisely S: Psychological interventions for needle-related procedural pain and distress in children and adolescents. Cochrane Database Syst Rev 2006, 4.
[6]	Higgins JP, Green S: Cochrane handbook for systematic reviews of interventions, vol. 5: Wiley Online Library; 2008.
[7]	Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux P, Prasad K, Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R: How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA 2014, 312 (2): 171-179.
[8]	Allen IE, Olkin I: Estimating time to conduct a meta-analysis from number of citations retrieved. Jama 1999, 282 (7): 634-635.
[9]	RxNorm [http://www.nlm.nih.gov/research/umls/rxnorm/]
[10]	Wang Z, Noor A, Elraiyah T, Murad M: Dual monitors to increase efficiency of conducting systematic reviews. In: 21st Cochrane Colloquium: Sep 19-23 2013; Quebec, Canada: Cochrane Collaboration; 2013.
[11]	Savoie I, Helmer D, Green CJ, Kazanjian A: Beyond Medline. International Journal of Technology Assessment in Health Care 2003, 19 (01): 168-178.
[12]	Blei DM, Ng AY, Jordan MI: Latent Dirichlet allocation. Journal of Machine Learning Research 2003, 3: 993-1022.
[13]	Shu L, Long B, Meng W: A latent topic model for complete entity resolution. In: Data Engineering, 2009 ICDE'09 IEEE 25th International Conference on: 2009: IEEE; 2009: 880-891.
[14]	Wang X, Mohanty N, McCallum A: Group and topic discovery from relations and text. In: 2005: ACM; 2005: 28-35.
[15]	Wang C, Blei D, Li F-F: Simultaneous image classification and annotation. In: Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on: 2009: IEEE; 2009: 1903-1910.
[16]	Wang X, Ma X, Grimson E: Unsupervised activity perception by hierarchical bayesian models. In: Computer Vision and Pattern Recognition, 2007 CVPR'07 IEEE Conference on: 2007: IEEE; 2007: 1-8.
[17]	Wang H, Ding Y, Tang J, Dong X, He B, Qiu J, Wild DJ: Finding complex biological relationships in recent PubMed articles using Bio-LDA. PLoS One 2011, 6 (3): e17243.
[18]	Liu B, Liu L, Tsykin A, Goodall GJ, Green JE, Zhu M, Kim CH, Li J: Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics 2010, 26 (24): 3105-3111.
[19]	Li D, N Xia, S Sohn, KB Cohen, CG Chute, H Liu: Incorporating Topic Modeling Features For Clinic Concept Assertion Classification. In: The 5th International Symposium on Languages in Biology and Medicine (LBM 2013) 12th and 13th Decemeber, 2013 2013; Tokyo, Japan; 2013.
[20]	Ogilvie MM, Tearne CF: Spontaneous abortion after hand-foot-and-mouth disease caused by Coxsackie virus A16. British medical journal 1980, 281 (6254): 1527.
[21]	Liu H, Wang T, Wei Y, Zhao G, Su J, Wu Q, Qiao H, Zhang Y: Detection of type 2 diabetes related modules and genes based on epigenetic networks. BMC Syst Biol 2014, 8 Suppl 1: S5.
[22]	Li D, T Thermeau, CG Chute, H Liu: Discovering Associations Among Diagnosis Groups Using Topic Modeling. In: AMIA Summits on Translational Science Proceedings: 2013; San Francisco; 2013.
[23]	Alison O, Thomas J, McNaught J, Miwa M, Ananiadou S: Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews 2015, 4 (1): 5.
[24]	Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF: Text categorization models for high-quality article retrieval in internal medicine. Journal of the American Medical Informatics Association 2005, 12 (2): 207-216.
[25]	Cohen AM: Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@ 95 measure. Journal of the American Medical Informatics Association 2011, 18 (1): 104-104.
[26]	Cohen AM, Hersh WR, Peterson K, Yen P-Y: Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association 2006, 13 (2): 206-219.
[27]	Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR: Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association 2012: amiajnl-2011-000784.
[28]	Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH: Semi-automated screening of biomedical citations for systematic reviews. BMC bioinformatics 2010, 11 (1): 55.
[29]	Bekhuis T, Demner-Fushman D: Towards automating the initial screening phase of a systematic review. Stud Health Technol Inform 2010, 160 (Pt 1): 146-150.
[30]	Bekhuis T, Demner-Fushman D: Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers. Artificial intelligence in medicine 2012, 55 (3): 197-207.
[31]	Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S: Reducing systematic review workload through certainty-based screening. Journal of biomedical informatics 2014, 51: 242-253.
[32]	Jonnalagadda S, Petitti D: A new iterative method to reduce workload in systematic review process. International Journal of Computational Biology and Drug Design 2013, 6 (1): 5-17.
[33]	Lee E, Dobbins M, DeCorby K, McRae L, Tirilis D, Husson H: An optimal search filter for retrieving systematic reviews and meta-analyses. BMC medical research methodology 2012, 12 (1): 1.
[34]	Shirahatti A: Text Retrieval for Systematic Reviews.
[35]	Bekhuis T, Tseytlin E, Mitchell KJ, Demner-Fushman D: Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence. PloS one 2014, 9 (1): e86277.
[36]	Hatcher E, Gospodnetic O, McCandless M: Lucene in action. In.: Manning Publications; 2004.
[37]	Pérez-Iglesias J, Pérez-Agüera JR, Fresno V, Feinstein YZ: Integrating the probabilistic models BM25/BM25F into Lucene. arXiv preprint arXiv:09115046 2009.
[38]	Lowe HJ, Barnett GO: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama 1994, 271 (14): 1103-1108.
[39]	McCallum AK: {MALLET: A Machine Learning for Language Toolkit}. 2002.
[40]	Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence: 2004: AUAI Press; 2004: 487-494.
[41]	Clement S, Lassman F, Barley E, Evans‐Lacko S, Williams P, Yamaguchi S, Slade M, Rüsch N, Thornicroft G: Mass media interventions for reducing mental health‐related stigma. The Cochrane Library 2013.
[42]	Petersen SH, Harling H, Kirkeby LT, Wille‐Jørgensen P, Mocellin S: Postoperative adjuvant chemotherapy in rectal cancer operated for cure. The Cochrane Library 2012.
[43]	Jefferson T, Rivetti A, Harnden A, Di Pietrantonj C, Demicheli V: Vaccines for preventing influenza in healthy children. The Cochrane Library 2008.
[44]	DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials 1986, 7 (3): 177-188.
[45]	Altman DG, Bland JM: Interaction revisited: the difference between two estimates. Bmj 2003, 326 (7382):219.
[46]	Li D, Z Wang, F Shen, MH Murad, H Liu: Reducing the Screening Burden of Systematic Review with a Multiple-level Relevance Ranking System. In: American Medical Informatics Association: 2014; Washington, DC; 2014.
[47]	Woods D, Trewheellar K: Medline and Embase complement each other in literature searches. Bmj 1998, 316 (7138): 1166.

Cite This Article

Plain Text BibTeX RIS

APA Style

Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, et al. (2016). A Text-Mining Framework for Supporting Systematic Reviews. American Journal of Information Management, 1(1), 1-9. https://doi.org/10.11648/j.infomgmt.20160101.11

Copy | Download

ACS Style

Dingcheng Li; Zhen Wang; Liwei Wang; Sunghwan Sohn; Feichen Shen, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am. J. Inf. Manag. 2016, 1(1), 1-9. doi: 10.11648/j.infomgmt.20160101.11

Copy | Download

AMA Style

Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag. 2016;1(1):1-9. doi: 10.11648/j.infomgmt.20160101.11

Copy | Download

@article{10.11648/j.infomgmt.20160101.11,
  author = {Dingcheng Li and Zhen Wang and Liwei Wang and Sunghwan Sohn and Feichen Shen and Mohammad Hassan Murad and Hongfang Liu},
  title = {A Text-Mining Framework for Supporting Systematic Reviews},
  journal = {American Journal of Information Management},
  volume = {1},
  number = {1},
  pages = {1-9},
  doi = {10.11648/j.infomgmt.20160101.11},
  url = {https://doi.org/10.11648/j.infomgmt.20160101.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.infomgmt.20160101.11},
  abstract = {Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - A Text-Mining Framework for Supporting Systematic Reviews
AU  - Dingcheng Li
AU  - Zhen Wang
AU  - Liwei Wang
AU  - Sunghwan Sohn
AU  - Feichen Shen
AU  - Mohammad Hassan Murad
AU  - Hongfang Liu
Y1  - 2016/08/31
PY  - 2016
N1  - https://doi.org/10.11648/j.infomgmt.20160101.11
DO  - 10.11648/j.infomgmt.20160101.11
T2  - American Journal of Information Management
JF  - American Journal of Information Management
JO  - American Journal of Information Management
SP  - 1
EP  - 9
PB  - Science Publishing Group
UR  - https://doi.org/10.11648/j.infomgmt.20160101.11
AB  - Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
VL  - 1
IS  - 1
ER  -

Copy | Download

Author Information

Dingcheng Li

Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Zhen Wang

Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Liwei Wang

Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Sunghwan Sohn

Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Feichen Shen

Department of Health Sciences Research, Mayo Clinic, Rochester, USA
Mohammad Hassan Murad

Robert D. and Patricia E. Kern Centre for the Science of Health Care Delivery, Mayo Clinic, Rochester, USA
Hongfang Liu

Department of Health Sciences Research, Mayo Clinic, Rochester, USA

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, et al. (2016). A Text-Mining Framework for Supporting Systematic Reviews. American Journal of Information Management, 1(1), 1-9. https://doi.org/10.11648/j.infomgmt.20160101.11

Copy | Download

ACS Style

Dingcheng Li; Zhen Wang; Liwei Wang; Sunghwan Sohn; Feichen Shen, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am. J. Inf. Manag. 2016, 1(1), 1-9. doi: 10.11648/j.infomgmt.20160101.11

Copy | Download

AMA Style

Dingcheng Li, Zhen Wang, Liwei Wang, Sunghwan Sohn, Feichen Shen, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag. 2016;1(1):1-9. doi: 10.11648/j.infomgmt.20160101.11

Copy | Download

@article{10.11648/j.infomgmt.20160101.11,
  author = {Dingcheng Li and Zhen Wang and Liwei Wang and Sunghwan Sohn and Feichen Shen and Mohammad Hassan Murad and Hongfang Liu},
  title = {A Text-Mining Framework for Supporting Systematic Reviews},
  journal = {American Journal of Information Management},
  volume = {1},
  number = {1},
  pages = {1-9},
  doi = {10.11648/j.infomgmt.20160101.11},
  url = {https://doi.org/10.11648/j.infomgmt.20160101.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.infomgmt.20160101.11},
  abstract = {Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - A Text-Mining Framework for Supporting Systematic Reviews
AU  - Dingcheng Li
AU  - Zhen Wang
AU  - Liwei Wang
AU  - Sunghwan Sohn
AU  - Feichen Shen
AU  - Mohammad Hassan Murad
AU  - Hongfang Liu
Y1  - 2016/08/31
PY  - 2016
N1  - https://doi.org/10.11648/j.infomgmt.20160101.11
DO  - 10.11648/j.infomgmt.20160101.11
T2  - American Journal of Information Management
JF  - American Journal of Information Management
JO  - American Journal of Information Management
SP  - 1
EP  - 9
PB  - Science Publishing Group
UR  - https://doi.org/10.11648/j.infomgmt.20160101.11
AB  - Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
VL  - 1
IS  - 1
ER  -

Copy | Download