Data Science, ML and Analytics Engineering

Simple steps to make your Python code better

Many of you have GIT code repositories, in this post I’ll show you how to make your Python code better.

I’ll use this repository as an example: https://github.com/Aykhan-sh/pandaseda

Fork it and try to make the code better.

Improving code readability

Improving the readability of your code is easy. We will use libraries for syntax formatting and validation.

First, let’s create configuration files for flake8, mypy and black in the repository.

Let’s install them first:

pip install black flake8 mypy

Read more

How to prepare for a data science interview

Data science interview is not easy. There is considerable uncertainty about the issues. Regardless of what kind of work experience you have or what kind of data science certification you have, the interviewer may be throwing you a series of questions that you weren’t expecting. During a data science interview, the interviewer will have technical questions on a wide range of topics, requiring the interviewee to have both strong knowledge and good communication skills.

In this note, I would like to talk about how to prepare for a machine learning science / interview date. We will sort out the categories of questions, I will share links with questions and answers to frequently asked questions.

Question categories

Traditionally, data science / machine learning interviews include the following categories of questions:

  1. Statistics
  2. Machine learning algorithms
  3. Programming skills, algorithms and data structures
  4. Knowledge of the domain area
  5. Machine Learning Systems Design
  6. Behavioral
  7. Culture Fit
  8. Problem-Solving

Read more

BentoML – Faster Machine Learning Prototype

In this post, I’ll show you how to create a working prototype of a web application with a working machine learning model in 50 lines of Python code. Imagine you have a cool project idea. Now you need to implement MVP (minimum viable product) and show it to your manager / partner / investor or just show off to your friends.

We will be using BentoML. It is a flexible, high-performance platform that is ideal for building an MVP.

BentoML features:

  • supports multiple machine learning frameworks including Tensorflow, PyTorch, Keras, XGBoost, and more.
  • own cloud deployment with Docker, Kubernetes, AWS, Azure and many more
  • high performance online service via API
  • web dashboards and APIs for managing model registry and deployment

Read more

Python logging HOWTO: logging vs loguru

In this post we will try to choose a library for logging in Python. Logs help to record and understand what went wrong in the work of your service. Informational messages are often written to the logs. For example: parameters, quality metrics and model training progress. An example of a piece of model training log:

An example of a piece of model training log
An example of a piece of model training log

Read more

Machine Learning Models in production: Flask and REST API

A trained machine learning model alone will not add value for business. The model must be integrated into the company’s IT infrastructure. Let’s develope REST API microservice to classify Iris flowers. The dataset consists of the length and width of two types of Iris petals: sepal and petal. The target variable is Iris variety: 0 – Setosa, 1 – Versicolor, 2 – Virginica.

Saving and loading a model

Before moving on to develope API, we need to train and save the model. Take the RandomForestClassifier model. Now let’s save the model to a file and load it to make predictions. This can be done with pickle or joblib.

import pickle filename = 'model.pkl'
pickle.dump(clf, open(filename, 'wb'))

We’ll use pickle.load to load and validate the model.

loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test) 
print(result)

The code for training, saving and loading the model is available in the repository — link

Read more