Data Science, ML and Analytics Engineering

RFM analysis in Python

The problem

Make an RFM analysis. It divides users into segments depending on the prescription (Recency), frequency (Frequency) and the total amount of payments (Monetary).

  • Recency – the difference between the current date and the date of the last payment
  • Frequency — number of transactions
  • Monetary – amount of purchases

These three indicators must be calculated separately for each customer. Then put marks from 1-3 or 1-5. The wider the range, the narrower segments we get.

Points can be set using quantiles. We sort the data according to one of the criteria and divide it into equal groups.

For this task, we use the public dataset: nd the olist_orders_dataset.csv and olist_order_payments_dataset.csv files. You can connect them order_id.

Implementing RFM in Python

import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from datetime import date,timedelta
df_orders = pd.read_csv('data/olist_orders_dataset.csv')
df_payments = pd.read_csv('data/olist_order_payments_dataset.csv')

We import all the necessary libraries and read the files.

We will rely on the date of delivery of the order by the carrier (order_delivered_carrier_date), so we will convert it to datetime64. Let’s create an order_id index and connect our datasets by it.

df_orders['order_delivered_carrier_date'] = pd.to_datetime(df_orders['order_delivered_carrier_date'])
df_payments = df_payments.set_index('order_id')
df_orders = df_orders.set_index('order_id')
order_payment = df_orders.join(df_payments)

Since the dataset is not the most recent, we will use max+1 instead of the current date. To create recency, frequency, and monetary, we’ll group our records by customer_id.

last_date = order_payment['order_delivered_carrier_date'].max() + timedelta(days=1)
rfmTable = order_payment.reset_index().groupby('customer_id').agg({'order_delivered_carrier_date': lambda x: (last_date - x.max()).days,
                                                 'order_id': lambda x: len(x), 
                                                 'payment_value': lambda x: x.sum()})
rfmTable.rename(columns={'order_delivered_carrier_date': 'recency', 
                          'order_id': 'frequency', 
                          'payment_value': 'monetary_value'}, inplace=True)
RFM анализ
RFM анализ

Now we need to set the grades. We will split into a range from 1 to 5. The wider the range, the more accurate our groups, but at the same time, it is more difficult to work with a large number of combinations.

quantiles = rfmTable.quantile(q=[0.20, 0.40, 0.60, 0.80])
quantiles = quantiles.to_dict()
segmented_rfm = rfmTable

def RScore(x,p,d):
     if x <= d[p][0.20]:
         return 1
     elif x <= d[p][0.40]:
         return 2
     elif x <= d[p][0.60]: 
         return 3
     elif x<=d[p][0.80]:
         return 4
         return 5

segmented_rfm['r_quartile'] = segmented_rfm['recency'].apply(RScore, args=('recency',quantiles,))
segmented_rfm['f_quartile'] = segmented_rfm['frequency'].apply(RScore, args=('frequency',quantiles,))
segmented_rfm['m_quartile'] = segmented_rfm['monetary_value'].apply(RScore, args=('monetary_value',quantiles,))
segmented_rfm['RFMScore'] = + +
RFM Score

Separately, we can see the average value of recency for all RFMScore.

fig=plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
 segmented_rfm.groupby('RFMScore').agg('recency').mean().plot(kind='bar', colormap='Blues_r')
Recency Score

While it may not be very informative. Let’s try to understand how to use the analysis results.

How to use the results of RFM analysis

The purpose of RFM analysis is to form segments and, depending on the segment, influence them in a certain way. For example: offer a bonus, benefit, send a push or email notification. It’s important to do it in a targeted way.

The effect of using RFM analysis can be as follows: customer retention, increased income, increased customer loyalty.

Examples of interpretation of RFM analysis segments:

  • R=5, F=5, M=5 — they pay often, a lot and recently. The most loyal and active users.
  • R=1, F=1, M=1 – they pay little, rarely and for a long time. Most likely lost customers. It may not be worth taking action to return them if the cost of attraction is higher than the expected profit.
  • R=1/2, F=4/5, M=4/5 — loyal users on the verge of leaving. We offer them a bonus, a discount and try to return them.
  • R=4/5, F=1, M=1/2/3/4/5 — Users have recently made a payment. We try to encourage them to buy more.


We dismantled a simple and effective way to segment users / customers. RFM analysis will help you choose targeted exposure and thus increase revenue or retain users.

Share it

If you liked the article - subscribe to my channel in the telegram or you can support me Become a Patron!

Other entries in this category: