Evaluating spammer detection systems for Twitter

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Ho, Trung Minh

Author ORCID Profiles (clickable)

Degree

Master of Computing

Grantor

Unitec Institute of Technology

Date

2017

Supervisors

Liesaputra, Veronica
Mohaghegh, Dr Mahsa
Yongchareon, Dr. Sira

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

Twitter
spam detection
spam drift
optimisation subset of features
evaluation workbench
feature selection
machine learning

Citation

Ho, T. M. (2017). Evaluating spammer detection systems for Twitter. An unpublished thesis submitted for the degree of Master of Computing, Unitec Institute of Technology, Auckland, New Zealand.

Abstract

Twitter is a popular Social Network Service. It is a web application with dual roles of online social network and microblogging. Users use Twitter to find new friends, update their activities or communicate with each other by posting tweets. Its popularity attracts many spammers wanting to spread advertising or malware. Many systems are proposed to detect spammers or spam tweets using a different subset of features, or by extracting the features based on different numbers of recent tweets. However, we do not know which proposed system is the best, because they use different techniques, such as different subsets of features, number of recent tweets, classifiers and evaluation metrics. Over time, spammers will change their key features to disguise themselves as a normal user, and we do not know whether the current systems are able to cope well with this phenomenon, which is called spam drift. In this research we have created a tool called WEST, which stands for "Workbench Evaluation Spammer detection system in Twitter." This tool allows users to investigate their model's performance against the spam drift problem and save much time for further users to do their research in this field; for example, extracting the features, which is a time-consuming process. Also, we did comprehensive investigative studies on the proposed 172 features, which include content-based and user-based features from the existing systems to find the optimised subset of features that is effective, efficient and resilient at detecting spammers. Based on the investigation of the 172 existing features, we found a model that we called ASDF, which stands for Anti Spam-Drift Features, that could detect spammers at 91% True Positive rate and performed the best at handling the spam drift problem compared to the existing spammer detection systems.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at