Arun Rathi

Software Engineer (Python) Intern
arunrathi201@gmail.com
Noida · India

🌈 Hi there!! 👋 Glad to see you here.

This is my personal space to track my projects and activities. I love to spend my free time here.
🌱 I’m currently learning about deep learning, AWS and Spanish.
💬 Ask me about Data Science, Machine Learning and Math.
⚡ Fun fact: Data is never clean
Connect with me on


Experience

Software Engineer Intern

Agilitix AI, Hyderabad

Main role is built a machine learning model
- Worked on a Deep Learning model to detect the anomalies in a multivariate time series data.
- Univariate time series data analysis using Prophet (Facebook's open source library), StumPy.
- Worked on multiple research papers to built a deep learning model for anomaly detection.
- Built a dashboard for time series dataset to display the trends and seasonality.
- Worked on a Java project using OOPs

July - December 2020

Kaggle Contributer

Kaggle

Start my kaggle journey to learn from data science community(Kaggler's) and looking forward to participate in the Kaggle competition

January 2021 - Present

Projects

Personalized Cancer Diagnosis - MultiClass classification Problem

Built a machine learning model to diagnose the cancer patient.
- Train multiple machine learning model i.e Naive Bayes, K-nearest Neighbors, Random Forest and XGBoost algorithms.
- Found XGBoost - best performer model, used logloss, Precision and Recall as evaluation metrics.

Taxi demand prediction in New York City - Time-Series Problem (Regression)

Used publicly available dataset from [NYC Yellow Taxi](https://www1.nyc.gov/site/tlc/index.page)
- First Model used is the Moving Averages Model which uses the previous n values in order to predict the next value
- Other models - Weighted Moving Averages, Exponential Weighted Moving Averages, EWMA - perform well compared to others.
- Regression models - Linear Regression, Random Forest Regressor, XgBoost Regressor.
- XgBoost Regressor performs well in Regression models but there is slightly difference in the MAPE value between EWMA and XgBoost.
- Learning - Simple model can perform well.

NLP Problem - Review Polarity classification of Amazon fine food reviews

Built multiple machine learning model to find best performer model.
- Used Feature Embedding methods - Bag of Words (BoW), TF-IDF vectorizer
- Found XGBoost - best performer model
- Evaluation Metrics - AUC and F1-score


Education

National Institute of Technology (NIT) Kurukshetra

Masters of Technology
Mechanical Engineering - Thermal Engineering

GPA: 7.98

August 2018 - July 2020

Tezpur (Central) University

Bachelors of Technology
Mechanical Engineering - Thermal Engineering

GPA: 8.01

August 2014 - May 2018

Skills

Programming Languages & Tools
  • Python, SQL, Java
  • Jupyter Notebook, Google collab
  • Data Science Library's: NumPy, Pandas, SciPy, Scikit-learn, Matplotlib, Seaborn
  • PyCharm, VS Code, Intellij idea
  • AWS, MySQL, GCP, JIRA, Confluence, ElasticSearch
  • Good knowledge of Machine Learning - Linear Regression, Logistic Regression, K-nearest Neighbors, Naive Bayes, Decision Tree, Random Forest, XGBoost.
  • Good understanding of Deep Learning models - MLP, CNN, VGGnet16, LSTM, Autoencoders, Transformers.

Updated Resume