Evaluating spammer detection systems for Twitter

Loading...
Thumbnail Image
Other Title
Authors
Ho, Trung Minh
Author ORCID Profiles (clickable)
Degree
Master of Computing
Grantor
Unitec Institute of Technology
Date
2017
Supervisors
Liesaputra, Veronica
Mohaghegh, Dr Mahsa
Yongchareon, Dr. Sira
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
Twitter
spam detection
spam drift
optimisation subset of features
evaluation workbench
feature selection
machine learning
Citation
Ho, T. M. (2017). Evaluating spammer detection systems for Twitter. An unpublished thesis submitted for the degree of Master of Computing, Unitec Institute of Technology, Auckland, New Zealand.
Abstract
Twitter is a popular Social Network Service. It is a web application with dual roles of online social network and microblogging. Users use Twitter to find new friends, update their activities or communicate with each other by posting tweets. Its popularity attracts many spammers wanting to spread advertising or malware. Many systems are proposed to detect spammers or spam tweets using a different subset of features, or by extracting the features based on different numbers of recent tweets. However, we do not know which proposed system is the best, because they use different techniques, such as different subsets of features, number of recent tweets, classifiers and evaluation metrics. Over time, spammers will change their key features to disguise themselves as a normal user, and we do not know whether the current systems are able to cope well with this phenomenon, which is called spam drift. In this research we have created a tool called WEST, which stands for "Workbench Evaluation Spammer detection system in Twitter." This tool allows users to investigate their model's performance against the spam drift problem and save much time for further users to do their research in this field; for example, extracting the features, which is a time-consuming process. Also, we did comprehensive investigative studies on the proposed 172 features, which include content-based and user-based features from the existing systems to find the optimised subset of features that is effective, efficient and resilient at detecting spammers. Based on the investigation of the 172 existing features, we found a model that we called ASDF, which stands for Anti Spam-Drift Features, that could detect spammers at 91% True Positive rate and performed the best at handling the spam drift problem compared to the existing spammer detection systems.
Publisher
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
Copyright license
Available online at