Notes about Machine Learning, Data Science and Analytics Engineering

How to make your CV attractive with a pet project

For junior Date Scientists, a CV consists of courses taken, education, and possibly not the most relevant work experience. Such resumes are not much different from the bulk of job seekers.

Working on a pet project is a great opportunity to improve skills. If you add the implemented pet-project to the CV, it will immediately become attractive and a topic for conversation at the interview will appear.

So what is a pet-project? Pet-project is a project that is done for yourself. It is created outside of work and is often self-interested. For example: sports, electronics, food preparation, auto, travel, medicine, etc. The project will help expand professional skills and learn new ones that will be useful in work.

Here are some ideas for projects in Data Science that you can get started with:

Classificate Parkinson’s Disease

Dataset – https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons

Task – Build a model that determines the presence of Parkinson’s disease in humans.

What you will learn:

  • Working with tabular data
  • Working with Gradient Boosting Libraries. For example: XGBoost
  • Work with platforms to host the service online. For example: Heroku
  • Create a website that will be able to determine the presence of Parkinson’s disease in a person based on the questions asked. Model output to production.
Dataset for determining the presence of Parkinson's disease
Dataset for determining the presence of Parkinson’s disease

Telegram channels analytics system

Dataset – you need to collect it yourself using the telegram API

The task is to create a service that parses the specified telegram channel and provides useful analytics.

What you will learn:

  • Working with the telegram API
  • Work with text
  • Libraries: pandas, nltk, pymorphy2, spacy and others
  • Work with platforms to host the service online. For example: Heroku
  • Create a website that provides useful analytics based on the specified channel name.

The code for the parser and analytics can be found here

The plot of the dynamics of posts in the telegram channel
The plot of the dynamics of posts in the telegram channel

Background generation of music albums

Dataset – you need to collect it yourself, using cover parsing, album descriptions. For example, you can use the Spotify API.

The task is to generate the cover of a music album by genre.

What you will learn:

  • Working with GAN
  • Working with Spotify API, beautifulsoup, selenium to parse data
  • Libraries: pytorch / keras / tf, openCV and others
  • Work with platforms to host the service online. For example: Heroku
Comparison of original covers with generated
Comparison of original covers with generated

Paper on this topic – https://ryanmcconville.com/publications/AlbumCoverGenerationFromGenreTags.pdf

Air pollution

Dataset – you need to collect it yourself or if you use open ones. For example, data from the site https://airkaz.org/ with sensor readings is available.For more recent data, contact the site creator.

The task is to do useful data analysis, make a model, determine the trend, seasonality, add external linked data. For example, the weather.

What you will learn:

  • Working with time series.
  • Libraries: pandas, seaborn, folium, fbprophet and others
  • Make an interactive service using, for example, Heroku and https://www.streamlit.io/
Map of Almaty with sensor marks and average measurement of air pollution
Map of Almaty with sensor marks and average measurement of air pollution

The code is available – https://github.com/alimbekovKZ/jupyter_notebooks_2/tree/master/airkaz

Action plan

Come up with a project on any topic. An action plan might look something like this:

  • Find data. Data can be either ready-made from open sources or collected from the Internet. I recommend collecting the data yourself. This will develop parsing skills and the use of tools such as selenium and beautifulsoup
  • Label data if necessary
  • Train machine learning models / do automated analytics
  • Implement a service for the machine learning model to work. I recommend looking aside – https://www.streamlit.io/
  • Host a service on a platform, for example: Heroku

In addition to reading, an article on the pet-project topic – https://habr.com/ru/company/ods/blog/335998/

After the implementation of the project, there will be something to show to a potential employer and it will be possible to have a substantive conversation about the skills and experience that they acquired while doing the project.

Share it

If you liked the article - subscribe to my channel in the telegram https://t.me/renat_alimbekov or you can support me Become a Patron!


Other entries in this category: