Redis Centric Real-Time Fraud Detection

Sachin Kottarathodi
6 min readApr 24, 2020

The following is our entry for Redis Day “Beyond Cache” hackathon conducted by Redis Labs at Bangalore, which won us the 2nd runner up.

Redis “Beyond Cache” was an interesting theme — we (Me and Ritesh Ghodrao) were unaware of how Redis could be used for anything more than a cache or database. We were up for a surprise.

Redis provides multiple advanced data structures and dynamically loadable libraries called “modules”, which extends the feature set of Redis to domains beyond databases. With built-in support for advanced data structures, Redis not just helps build low latency applications but also loosely coupled microservices. This led us to believe that we could try and build an application to solve one of the biggest problems in the industry today, Internet Fraud.

Image Credit: Redis Labs

Long gone are the days when an action taken on post-analysis of a transaction would have sufficed. Internet fraud is evolving, the more you try to stop abuse, the more complex methods fraudsters use to trick you. The need for effective real-time fraud detection is more than ever.

The idea of a real-time fraud detector is not new, but getting it right is a major challenge.

  • When we say real-time, there is a very small window available within which one has to decide on whether an online activity is fraudulent or not. This time window could even be a few milliseconds as in the case of ad networks (where the highest amount of fraud is seen).
  • A lot of signals from events need to be considered and complex analysis has to be done within this time window, which demands highly time-sensitive data structures.
  • Further, complex fraud checks require tons of data, meaning there is a need for memory-optimized data structures as well.

Multi-Model Redis

Redis — in addition to the highly scalable, performant and reliant database — provides structures like sorted sets, probabilistic data structures like bloom filter, TopK and CMS, which can store tons of information at tuneable error rates.

Image Credit: Redis Labs

Redis Streams is a Kafka-like log data type with support for consumer groups, to allow a group of clients to cooperate in consuming a different portion of the same stream of messages.

RedisGears — to quote —

is a Redis module that adds a serverless engine for multi-model and cluster operations in Redis, supporting both event-driven and batch operations.

With builtin support to consume from a stream, Redis Gears can also be used to build a loosely coupled data transformation and ingestion service.

In the interest of time for the hackathon, we limited the scope to build a working prototype using the above-mentioned modules to identify three different kinds of fraud in the ad network, after which we persist data and visualize the results.

Design

Fraud detection module — does 3 types of fraud checks on the click event.

Fraud types and the Redis structure used to identify fraud.
  • Stacked Ads: Fraudsters can have multiple ads stacked in a way such that when a user clicks on one ad, one-click for every stacked ad is generated.
    A characteristic of this type of fraud is the time interval between such clicks — which is extremely less — many at times even zero. We have used Redis Sorted Sets here, with the timestamp of the event as the key. A range query on the number of events between two timestamps from the same source helps us determine if ads are stacked.
  • Clicks from a blacklisted IP: There can be thousands of IP ranges leading to millions of IPs that can be blacklisted based on the activity from these IPs. One way to detect if a source IP is blacklisted is to use Redis as a cache and to use ‘exists’ command, but the amount of memory used will be in O(n), which might not be acceptable for a huge number of items. We have used the probabilistic data structure Redis Bloom Filter which is populated with IPs and looked up to check if IP is not in the filter.
  • Clicks from a suspicious location: We have used Redis GeoSpatial for this. Once we derive lat and long from IP, the data is indexed using GeoSpatial. GeoSpatial provides commands such as GeoRadius, which gives us the number of points within a range, that helps us derive insights on the fraud percentage in say — 100km radius of the source location.
Redis Centric Real-Time Fraud Detector

The fraud detection module then publishes data to a Redis Stream, for deriving further insights.

Redis Gear consumes from the stream and does data transformations before populating Redis Time Series.
Redis Time Series is used as a cache for aggregated data. Using the Grafana data source for Redis time Series, we powered up the Grafana dashboard to visualize events in real-time.

Real-time event rates in Grafana.

Finally, data is saved to the Redis database for persistence.

Redis Insight Time Series Visualization.

RedisInsight is a handy tool for monitoring and interacting with data in Redis. Insight has inbuilt visualization support for time series data as well.

Event Simulator showing the result of the event.

Cache and Beyond.

The structure and modules used are a natural fit for our fraud detection use cases. Providing a custom implementation for these advanced data structures or trying to plug an external implementation would have added another layer that would complicate the system.

We have not had to rely on any other tech for building loosely coupled microservices. Redis Streams is a reliable framework to save messages and persist them later in case consumers fail to read.

Most importantly, we have reduced a lot of layers to simplify the application using Redis alone. As the scale of the system increases, simplicity will be the key to easily maintain the application. Focus can be directed towards adding more business value rather than worrying about managing operations and building/integrating solutions.

Future Work

This application was built specifically with ad network fraud in mind for the hackathon, but internet fraud across domains has a similar pattern, which makes us think we can build a generic fraud detector and open it up for developers to extend and make it more domain aware.

To support more complex fraud types that do not have easily identifiable patterns in real-time, Redis and modules seem like the way to go.

  • AI plays a huge part in Fraud detection and Redis has extensive support for that. RedisAI is a module for serving tensors and executing deep learning models. Neural Redis implements feed-forward neural networks as a native data type for Redis. RedisML provides several machine learning models as datatypes.
  • Redis Cell provides rate limiting as a single command — which is a must for publically exposed services.

With more advanced and useful modules being developed and added to the existing list, Redis can be the backbone of a huge number of systems requiring high performance with complex computations, by maintaining the simplicity that Redis is always known for.

Let us know what you think of our submission and the growing Redis ecosystem.

--

--