For junior Date Scientists, a CV consists of courses taken, education, and possibly not the most relevant work experience. Such resumes are not much different from the bulk of job seekers.
Working on a pet project is a great opportunity to improve skills. If you add the implemented pet-project to the CV, it will immediately become attractive and a topic for conversation at the interview will appear.
So what is a pet-project? Pet-project is a project that is done for yourself. It is created outside of work and is often self-interested. For example: sports, electronics, food preparation, auto, travel, medicine, etc. The project will help expand professional skills and learn new ones that will be useful in work.
Here are some ideas for projects in Data Science that you can get started with:
Classificate Parkinson’s Disease
Dataset – https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons
Task – Build a model that determines the presence of Parkinson’s disease in humans.
What you will learn:
- Working with tabular data
- Working with Gradient Boosting Libraries. For example: XGBoost
- Work with platforms to host the service online. For example: Heroku
- Create a website that will be able to determine the presence of Parkinson’s disease in a person based on the questions asked. Model output to production.
Telegram channels analytics system
Dataset – you need to collect it yourself using the telegram API
The task is to create a service that parses the specified telegram channel and provides useful analytics.
What you will learn:
- Working with the telegram API
- Work with text
- Libraries: pandas, nltk, pymorphy2, spacy and others
- Work with platforms to host the service online. For example: Heroku
- Create a website that provides useful analytics based on the specified channel name.
The code for the parser and analytics can be found here
Background generation of music albums
Dataset – you need to collect it yourself, using cover parsing, album descriptions. For example, you can use the Spotify API.
The task is to generate the cover of a music album by genre.
What you will learn:
- Working with GAN
- Working with Spotify API, beautifulsoup, selenium to parse data
- Libraries: pytorch / keras / tf, openCV and others
- Work with platforms to host the service online. For example: Heroku
Paper on this topic – https://ryanmcconville.com/publications/AlbumCoverGenerationFromGenreTags.pdf
Air pollution
Dataset – you need to collect it yourself or if you use open ones. For example, data from the site https://airkaz.org/ with sensor readings is available.For more recent data, contact the site creator.
The task is to do useful data analysis, make a model, determine the trend, seasonality, add external linked data. For example, the weather.
What you will learn:
- Working with time series.
- Libraries: pandas, seaborn, folium, fbprophet and others
- Make an interactive service using, for example, Heroku and https://www.streamlit.io/
The code is available – https://github.com/alimbekovKZ/jupyter_notebooks_2/tree/master/airkaz
Action plan
Come up with a project on any topic. An action plan might look something like this:
- Find data. Data can be either ready-made from open sources or collected from the Internet. I recommend collecting the data yourself. This will develop parsing skills and the use of tools such as selenium and beautifulsoup
- Label data if necessary
- Train machine learning models / do automated analytics
- Implement a service for the machine learning model to work. I recommend looking aside – https://www.streamlit.io/
- Host a service on a platform, for example: Heroku
In addition to reading, an article on the pet-project topic – https://habr.com/ru/company/ods/blog/335998/
After the implementation of the project, there will be something to show to a potential employer and it will be possible to have a substantive conversation about the skills and experience that they acquired while doing the project.
Read more my posts