The vast span of nouns, words and verbs in Persian language and the availability of information in all fields in the form of paper, book and internet arises the need of a system to compare texts and evaluate their similarities. In this paper a system has been presented for comparing the text and determining the degree of Persian (Farsi) text similarities. This system uses TF-IDF method to give weight to sentences. Moreover, the roots of the nouns have been found and identical score has been given to synonyms and word families. The results gained from implementation indicate that the proposed system has a desired efficiency in comparing short texts.
Text Similarity, TF-IDF, Semantic Similarity, Stemming
Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi, Software Development for Identifying Persian Text Similarity, International Journal of Intelligent Information Systems. Special Issue: Research and Practices in Information Systems and Technologies in Developing Countries. Vol. 3, No. 6-1, 2014, pp. 61-66. doi: 10.11648/j.ijiis.s.2014030601.21
