User Clustering on Social Media Using Language-Independent Features
Clustering users on social media based on text involves grouping individuals with similar text patterns, language usage, or content interests. This text-based clustering provides insights into user preferences, enables personalized content recommendations, and facilitates understanding of social networking trends and user engagement. However, traditional text clustering methods rely heavily on language-specific features. This limits their applicability in multilingual media environments where linguistic diversity prevails. In this paper, the problem of clustering users on social networks, specifically focusing on text-based clustering independent of the language in which the text is written, is addressed. A practical methodology is presented, outlining an iterative procedure for clustering based solely on language-independent features such as emojis, hashtags, URLs, text length, and punctuation count. The effectiveness of the language-independent clustering approach is compared with the usual text based clustering approach. Comparison of these results shows that for the used dataset, the proposed clustering method using language independent features gives higher quality results than text clustering.