Evaluating spammer detection systems for Twitter

Ho, Trung Minh

Evaluating spammer detection systems for Twitter

Files

MComp_Trung Minh Ho_1396701_Final Research.pdf (5.38 MB)

Authors

Ho, Trung Minh

Degree

Master of Computing

Grantor

Unitec Institute of Technology

Date

2017

Supervisors

Liesaputra, Veronica
Mohaghegh, Dr Mahsa
Yongchareon, Dr. Sira

Type

Masters Thesis

Keyword

Twitter
spam detection
spam drift
optimisation subset of features
evaluation workbench
feature selection
machine learning

ANZSRC Field of Research Code (2020)

080303 Computer System Security

080109 Pattern Recognition and Data Mining

Citation

Ho, T. M. (2017). Evaluating spammer detection systems for Twitter. An unpublished thesis submitted for the degree of Master of Computing, Unitec Institute of Technology, Auckland, New Zealand.

Abstract

Twitter is a popular Social Network Service. It is a web application with dual roles of online social network and microblogging. Users use Twitter to find new friends, update their activities or communicate with each other by posting tweets. Its popularity attracts many spammers wanting to spread advertising or malware. Many systems are proposed to detect spammers or spam tweets using a different subset of features, or by extracting the features based on different numbers of recent tweets. However, we do not know which proposed system is the best, because they use different techniques, such as different subsets of features, number of recent tweets, classifiers and evaluation metrics. Over time, spammers will change their key features to disguise themselves as a normal user, and we do not know whether the current systems are able to cope well with this phenomenon, which is called spam drift. In this research we have created a tool called WEST, which stands for "Workbench Evaluation Spammer detection system in Twitter." This tool allows users to investigate their model's performance against the spam drift problem and save much time for further users to do their research in this field; for example, extracting the features, which is a time-consuming process. Also, we did comprehensive investigative studies on the proposed 172 features, which include content-based and user-based features from the existing systems to find the optimised subset of features that is effective, efficient and resilient at detecting spammers. Based on the investigation of the 172 existing features, we found a model that we called ASDF, which stands for Anti Spam-Drift Features, that could detect spammers at 91% True Positive rate and performed the best at handling the spam drift problem compared to the existing spammer detection systems.

Permanent link

https://hdl.handle.net/10652/4525

Copyright holder

Author

Copyright notice

This item appears in:

Computing Dissertations and Theses

Full item page