Combining Visual and Contextual Information for Fraudulent Online Store Classification

Mostard, Wouter (2019) Combining Visual and Contextual Information for Fraudulent Online Store Classification. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
mAI_2019_MostardW.pdf
Download (2MB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (119kB)

Abstract

Following the rise of e-commerce there has been a dramatic increase in online criminal activity targeting online shoppers. This online oriented crime ranges from selling fake products, to not delivering the ordered products, and even stealing credit card information during check-out. Since the number of online stores has risen dramatically, manually checking these stores has become intractable. An automated process is therefore required. We will approach this problem by applying machine learning techniques to extract and detect instances of fraudulent online stores. To determine the legitimacy of an online store two sources of information are used. First, a baseline model is proposed based on meta-features, such as whether an SSL certificate is present. Supervised learning algorithms are applied to achieve the best possible baseline results. Second, visual information, like the presence of logos of payment methods, are subsequently added to improve the baseline model. Numerous segmentation methods, pre-trained networks, and learning algorithms are compared to achieve the best possible performance regarding visual feature extraction of logos on online stores. The random forest learning algorithm proved to be the best performing baseline learning algorithm. Auto canny and selective search show similar recall scores for extracting relevant image patches from the online stores, the former being the fastest method. The ResNet50 network proved to be both the fastest and best performing feature vector extractor for the image patches. Using the ResNet50 feature vectors, the support vector machine showed to be the best performing learning algorithm for logo classification. Combining both sources of information demonstrated a positive result, in particular when detecting fraudulent class cases. This research shows that applying various information sources in fraudulent online store classification has a significant positive effect. Furthermore, it shows that reasonable results in logo detection on online stores can be achieved using simple binarization methods and pre-trained convolutional neural networks. Interesting future research include: expanding the logo detection algorithm to other visual information, applying multimodal learning instead of simple feature concatenation, and building a scalable solution that is production worthy. Implementing the given results into a scalable solution could help for example law enforcement to quickly find potentially fraudulent online stores before too much harm is done.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Wiering, M.A.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	15 Jan 2019
Last Modified:	16 Jan 2019 09:24
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/19024

Actions (login required)

View Item