Redis Centric Real-Time Fraud Detection

Image Credit: Redis Labs
  • When we say real-time, there is a very small window available within which one has to decide on whether an online activity is fraudulent or not. This time window could even be a few milliseconds as in the case of ad networks (where the highest amount of fraud is seen).
  • A lot of signals from events need to be considered and complex analysis has to be done within this time window, which demands highly time-sensitive data structures.
  • Further, complex fraud checks require tons of data, meaning there is a need for memory-optimized data structures as well.

Multi-Model Redis

Redis — in addition to the highly scalable, performant and reliant database — provides structures like sorted sets, probabilistic data structures like bloom filter, TopK and CMS, which can store tons of information at tuneable error rates.

Image Credit: Redis Labs

Design

Fraud detection module — does 3 types of fraud checks on the click event.

Fraud types and the Redis structure used to identify fraud.
  • Stacked Ads: Fraudsters can have multiple ads stacked in a way such that when a user clicks on one ad, one-click for every stacked ad is generated.
    A characteristic of this type of fraud is the time interval between such clicks — which is extremely less — many at times even zero. We have used Redis Sorted Sets here, with the timestamp of the event as the key. A range query on the number of events between two timestamps from the same source helps us determine if ads are stacked.
  • Clicks from a blacklisted IP: There can be thousands of IP ranges leading to millions of IPs that can be blacklisted based on the activity from these IPs. One way to detect if a source IP is blacklisted is to use Redis as a cache and to use ‘exists’ command, but the amount of memory used will be in O(n), which might not be acceptable for a huge number of items. We have used the probabilistic data structure Redis Bloom Filter which is populated with IPs and looked up to check if IP is not in the filter.
  • Clicks from a suspicious location: We have used Redis GeoSpatial for this. Once we derive lat and long from IP, the data is indexed using GeoSpatial. GeoSpatial provides commands such as GeoRadius, which gives us the number of points within a range, that helps us derive insights on the fraud percentage in say — 100km radius of the source location.
Redis Centric Real-Time Fraud Detector
Real-time event rates in Grafana.
Redis Insight Time Series Visualization.
Event Simulator showing the result of the event.

Cache and Beyond.

The structure and modules used are a natural fit for our fraud detection use cases. Providing a custom implementation for these advanced data structures or trying to plug an external implementation would have added another layer that would complicate the system.

Future Work

This application was built specifically with ad network fraud in mind for the hackathon, but internet fraud across domains has a similar pattern, which makes us think we can build a generic fraud detector and open it up for developers to extend and make it more domain aware.

  • AI plays a huge part in Fraud detection and Redis has extensive support for that. RedisAI is a module for serving tensors and executing deep learning models. Neural Redis implements feed-forward neural networks as a native data type for Redis. RedisML provides several machine learning models as datatypes.
  • Redis Cell provides rate limiting as a single command — which is a must for publically exposed services.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store