Renat Alimbekov's personal blog -Data Science, ML and Analytics Engineering

Cohort Analysis in Python

What is cohort analysis?

Cohort analysis consists in studying the characteristics of cohorts / vintages / generations, united by common temporal characteristics..

A cohort/vintage/generation is a group formed in a specific way based on time: for example, the month of registration, the month of the first transaction, or the first visit to the site. Cohorts are very similar to segments, with the difference that a cohort includes groups of a certain period of time, while a segment can be based on any other characteristics.

Why is it valuable?

This kind of analysis can be helpful when it comes to understanding the health of your business and the stickiness of your customers. Stickiness is critical, as it is much cheaper and easier to retain a customer than it is to acquire new ones. Also, your product evolves over time. New features are added and removed, design changes, etc. Observing individual groups over time is the starting point for understanding how these changes affect user/group behavior.

The problem

Make an RFM analysis. It divides users into segments depending on the prescription (Recency), frequency (Frequency) and the total amount of payments (Monetary).

Recency – the difference between the current date and the date of the last payment
Frequency — number of transactions
Monetary – amount of purchases

These three indicators must be calculated separately for each customer. Then put marks from 1-3 or 1-5. The wider the range, the narrower segments we get.

Points can be set using quantiles. We sort the data according to one of the criteria and divide it into equal groups.

For this task, we use the public dataset: https://www.kaggle.com/olistbr/brazilian-ecommerce nd the olist_orders_dataset.csv and olist_order_payments_dataset.csv files. You can connect them order_id.

Medical Image Analysis In Python

The field of medical imaging has become very popular in recent years. Therefore, I write book where you will learn the basics of medical image analysis using Python. You will study CT and X-ray scans, segment images, and analyze metadata. Even if you have not used with medical imaging before, you will have all the necessary skills upon completion of the book.

Simple steps to make your Python code better

Many of you have GIT code repositories, in this post I’ll show you how to make your Python code better.

I’ll use this repository as an example: https://github.com/Aykhan-sh/pandaseda

Fork it and try to make the code better.

Improving code readability

Improving the readability of your code is easy. We will use libraries for syntax formatting and validation.

First, let’s create configuration files for flake8, mypy and black in the repository.

Let’s install them first:

pip install black flake8 mypy

How to prepare for a data science interview

Data science interview is not easy. There is considerable uncertainty about the issues. Regardless of what kind of work experience you have or what kind of data science certification you have, the interviewer may be throwing you a series of questions that you weren’t expecting. During a data science interview, the interviewer will have technical questions on a wide range of topics, requiring the interviewee to have both strong knowledge and good communication skills.

In this note, I would like to talk about how to prepare for a machine learning science / interview date. We will sort out the categories of questions, I will share links with questions and answers to frequently asked questions.

Question categories

Traditionally, data science / machine learning interviews include the following categories of questions:

Statistics
Machine learning algorithms
Programming skills, algorithms and data structures
Knowledge of the domain area
Machine Learning Systems Design
Behavioral
Culture Fit
Problem-Solving