The problem
Make an RFM analysis. It divides users into segments depending on the prescription (Recency), frequency (Frequency) and the total amount of payments (Monetary).
- Recency – the difference between the current date and the date of the last payment
- Frequency — number of transactions
- Monetary – amount of purchases
These three indicators must be calculated separately for each customer. Then put marks from 1-3 or 1-5. The wider the range, the narrower segments we get.
Points can be set using quantiles. We sort the data according to one of the criteria and divide it into equal groups.
For this task, we use the public dataset: https://www.kaggle.com/olistbr/brazilian-ecommerce nd the olist_orders_dataset.csv and olist_order_payments_dataset.csv files. You can connect them order_id
.
Implementing RFM in Python
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from datetime import date,timedelta
df_orders = pd.read_csv('data/olist_orders_dataset.csv')
df_payments = pd.read_csv('data/olist_order_payments_dataset.csv')
We import all the necessary libraries and read the files.
We will rely on the date of delivery of the order by the carrier (order_delivered_carrier_date
), so we will convert it to datetime64. Let’s create an order_id
index and connect our datasets by it.
df_orders['order_delivered_carrier_date'] = pd.to_datetime(df_orders['order_delivered_carrier_date'])
df_payments = df_payments.set_index('order_id')
df_orders = df_orders.set_index('order_id')
order_payment = df_orders.join(df_payments)
Since the dataset is not the most recent, we will use max+1 instead of the current date. To create recency, frequency, and monetary, we’ll group our records by customer_id
.
last_date = order_payment['order_delivered_carrier_date'].max() + timedelta(days=1)
rfmTable = order_payment.reset_index().groupby('customer_id').agg({'order_delivered_carrier_date': lambda x: (last_date - x.max()).days,
'order_id': lambda x: len(x),
'payment_value': lambda x: x.sum()})
rfmTable.rename(columns={'order_delivered_carrier_date': 'recency',
'order_id': 'frequency',
'payment_value': 'monetary_value'}, inplace=True)
Now we need to set the grades. We will split into a range from 1 to 5. The wider the range, the more accurate our groups, but at the same time, it is more difficult to work with a large number of combinations.
quantiles = rfmTable.quantile(q=[0.20, 0.40, 0.60, 0.80])
quantiles = quantiles.to_dict()
segmented_rfm = rfmTable
def RScore(x,p,d):
if x <= d[p][0.20]:
return 1
elif x <= d[p][0.40]:
return 2
elif x <= d[p][0.60]:
return 3
elif x<=d[p][0.80]:
return 4
else:
return 5
segmented_rfm['r_quartile'] = segmented_rfm['recency'].apply(RScore, args=('recency',quantiles,))
segmented_rfm['f_quartile'] = segmented_rfm['frequency'].apply(RScore, args=('frequency',quantiles,))
segmented_rfm['m_quartile'] = segmented_rfm['monetary_value'].apply(RScore, args=('monetary_value',quantiles,))
segmented_rfm['RFMScore'] = segmented_rfm.r_quartile.map(str) + segmented_rfm.f_quartile.map(str) + segmented_rfm.m_quartile.map(str)
print(segmented_rfm.head())
Separately, we can see the average value of recency for all RFMScore.
fig=plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
segmented_rfm.groupby('RFMScore').agg('recency').mean().plot(kind='bar', colormap='Blues_r')
plt.show()
While it may not be very informative. Let’s try to understand how to use the analysis results.
How to use the results of RFM analysis
The purpose of RFM analysis is to form segments and, depending on the segment, influence them in a certain way. For example: offer a bonus, benefit, send a push or email notification. It’s important to do it in a targeted way.
The effect of using RFM analysis can be as follows: customer retention, increased income, increased customer loyalty.
Examples of interpretation of RFM analysis segments:
- R=5, F=5, M=5 — they pay often, a lot and recently. The most loyal and active users.
- R=1, F=1, M=1 – they pay little, rarely and for a long time. Most likely lost customers. It may not be worth taking action to return them if the cost of attraction is higher than the expected profit.
- R=1/2, F=4/5, M=4/5 — loyal users on the verge of leaving. We offer them a bonus, a discount and try to return them.
- R=4/5, F=1, M=1/2/3/4/5 — Users have recently made a payment. We try to encourage them to buy more.
Conclusion
We dismantled a simple and effective way to segment users / customers. RFM analysis will help you choose targeted exposure and thus increase revenue or retain users.