GitHub - Profzubbyd/66DaysofData: My 66 days journey into the world of Data Science

My 66 Days Journey Into The World Of Data Science

Books


1	An Introduction to Statistical Learning
2	Hands-On Machine Learning with Scikit-Learn and TensorFlow

Podcasts


1	Linear Digressions
2	Not So Standard Deviations
3	Data Skeptic
4	Data Science at Home Podcast
5	O'Reilly Data Show Podcast ideas and resources
6	Data Stories
7	Talking Machines

Courses


1	Python
2	Pandas
3	Intro to Machine Learning
4	Data Visualization

Day 1

I'm very excited to be a part of this challenge.

Today I learnt more about Github through a freeCodeCamp video: https://youtu.be/RGOj5yH7evk
I created a repository for this challenge to have a detailed record of my progress.
I learnt about External dictionaries in Python. How to work with them and get more details about them. I learnt this through the Kaggle course: https://www.kaggle.com/colinmorris/working-with-external-libraries

Day 2

Today I finished the Kaggle Python course: https://www.kaggle.com/learn/python
It covered functions, loops, booleans, lists, strings and libraries. It was a really practical course.

Day 3

Learnt about Data Ethics and Scaling Machine Learning today from the Data Science from Home Podcast. Good episodes
Data Ethics are principles that guide people to do good while using data to build anything

5 Principles guiding Data Ethics

Human centric: Interest of human before commercial gain.
Equality: Should show reality of how diverse we are adn the impact to other different communities
Control: Full control to individuals you want to help
Transparency: Say what you do and do what you say. People should understand clearly what they are buying into.
Accountable: Be accountable through out the whole process from data collection to decision making.

I also learnt about scaling and handling big data. GPUs do a good job here because they are faster than CPUs but a lot less RAM.
Tools for scaling include Hadoop, Spark, RAPIDS, Dask.
I've always been seeing RAPIDS around but today I learnt what it means. Excited about the days ahead.

Day 4

Today I started the Kaggle course Intro to SQL and completed 3 modules with practice exercises. I learnt the full meaning of SQL. Didn't know this before. Also how to access and examine BigQuery Datasets. Learnt about clients, projects, tables, etc. Learnt how to write SQL queries with SELECT, FROM, WHERE, GROUP BY, HAVING and COUNT. Learnt that there are datasets as large as 3TB. Wow!!! This is why you need to set a limit on queries you fetch. I saw that it is really with SQL we can really analyse and ask interesting questions about data.
Also listened to "Becoming a machine learning practitioner" on O'Reilly Data Show Podcast. Was a great episode. A good was to learn ML is through building a project then learning the data science process behind it. The host uses AWS and codes as an Amazon ML developer. I learnt the value of attending lots of technical conferences. I also learnt that ML practitioners also have a reponsibility of educating their managers on ML use-cases since they are at the decison making layer.

Day 5

Today I completed the Kaggle course: Intro to SQL. Learnt about ORDER BY, AS WITH, CTEs and JOINING tables. I just found myself loving SQL. I realized you can ask so many interesting questions and also make analysis from several datasets (and tables in them). I highly recommend the course.
I also listened to "One Hot 2020" on Not So Standard Deviations Podcast. I learnt a lot of use-cases of AI(Artificial Intelligence) and ML(Machine Learning). The hosts Roger Peng and Hilary Parker analysed the differences between AI and ML. To mention one, AI systems come to you as a person and there's always real-time interaction eg self driving cars while ML just uses data from your past activities like recommender systems, Google Autofill, etc. It was a fun episode.

Day 6

Today I continued my "Big Data Analytics with Python - Day 5" course being offered by Utiva which I take every weekend. They have a great instructor. I learnt about Pandas today.
Specifically, I learnt that pandas is very fundamental for any Data Scientist.
Also learnt about Pandas Series, Dataframes and several things that can be done with them. A lot of work went into building pandas. There are so many things that can be done with it. It just makes data manipulation easier.

Day 7

Today I continued my "Big Data Analytics with Python - Day 6" course being offered by Utiva which I take every weekend. We finished up Pandas today.
We explored the Groupby function, it's very similar to that of SQL. We also looked at merging, joining and concatenation.
I learnt about visualization. A light intro to the blend of Pandas and matplotlib. Histogram, Bar chart, Scatter, Box & Line plot amongst others.
Visualization is a tool every Data Scientist must learn. That's how we tell the data story.

Day 8

Today, I started Kaggle course: Pandas. It gave me hands-on learning of the concepts I got exposed to over the weekend.
I learnt about indexing, selecting and assigning in Pandas. I also learnt about Summary functions and mapping in Pandas. Having a Maths background helped me to understand the mapping process better.
Something interesting about today's learning experience was the difference between iloc and loc indexing scheme. The fact that iloc uses Python stdlib indexing scheme while loc doesn't.

Day 9

Today, I finished the Kaggle course: Pandas. It was a highly practical course.
I learnt about Groupby, Multi-indexing, Sorting, Checking and Changing Datatypes, Handling missing values, Renaming and Combining with concat, join and merge. I love playing with pandas. The things you can do with it are really a lot.
There isn't much to say today as I really got my hands dirty with code.

Day 10

Today, I started the Kaggle course: Data Visualization. This was where Kaggle really broke python down. A lot of basic things.
I learnt about Seaborn specifically; lineplots, bar charts and heatmaps.
There is a whole lot that visualization can reveal. Looking at tables you may just see numbers but with visualization you can do analysis and make decisions. Heatmaps especially can give an eagle-eye view over everything. It's really beautiful.

Day 11

Today, I completed the Kaggle course: Data Visualization. Learnt about Scatter plots, Histogram, Density plots and changing sea born styles.
I learnt how to extract and analyse datasets. Specifically, I analysed the Housing in London dataset. I found some interesting things from the visualization:

Monthly crime rate is positively correlated with the number of houses sold. This is intuitive actually since people won't want to get a house in neighbourhoods with high crime rate.
The number of jobs per year is positively correlated with the population size of the city. So bigger cities have more jobs in United kingdom? Was shocked to see that. Excited about the days ahead.

Houses Sold vs. Crime Rate	Available Jobs vs Population Size

Day 12

Today, I learnt about time series analysis with R. I figured that it's good to know Python and R so I'm learning and applying both.
There are a lot of things that can be done with time series. Basically, there are 2 purposes of time series:

To model the stochastic (random) mechanism that gives rise to a series of data.
To perdict future occurence of the series based on the previous history.

I also learnt about the Principle of Parsimony: The simpliest model we can make that gives us all the necessary information required in the experiment is sufficient. This means a lot for me as a data scientist so I don't have to worry about all the data I get.

Day 13

Today, I continued my "Big Data Analytics with Python - Day 7" course being offered by Utiva which I take every weekend. We learnt about Matplotlib plots.
Though I have completed Kaggle's course on Data Visualization during the week, I still learnt a lot from today's class. Learnt about Object oriented plots, during the week I did more of pyplots.
This OOPs give you more control of your plots which I see as a very good feature. You can specify axes dimensions, draw a plot within another without doing subplots. Also, learnt about making adjustments to the plots, changing styles, setting limits, etc. Also handled a dataset and did some comparison among variables using plots.

Day 14

Today, I continued my "Big Data Analytics with Python - Day 8" course being offered by Utiva which I take every weekend. We learnt about Seaborn plots. The class was really extensive.
I learnt about Relational plots, Distribution plots, Categorical plots, Regression plots, Matrix plots, and Multi-plot grids (Facet, Pair and Joint grids)
I really like the class because we learn with the API reference of the documentation. Before now I saw documentation as something far but I've now learn thow to use it through practice.
I have learnt a lot about Data Visualization (Spending more than 12 hours within 4 days). I'll be switching gears tomorrow.

Day 15

Today, I finished the Kaggle course: Intro to Machine Learning. Learnt how to build and validate a model. I also learnt about overfitting and underfitting. You always have to find that sweet spot.
I learnt 2 models from the scikit-learn library: Decision Trees and Random Forests.
I also made my first submission in a Kaggle competition today. So happy about this. So much Machine Learning models to build in the future.

Day 16

Today, I read the article All Machine Learning Models Explained in 6 Minutes
Learnt about various Machine Learning Models. They aren't as much as I thought they'd be but I need to learn them indepth.

A Summary

Machine Learning models are divided into 2: Supervised and Unsupervised

Supervised: Regression (continuous output) and Classification (discrete output)
Regression: Linear Regression, Decision Tree, Random Forests and Neural Networks.
Classification: Logistic Regression, Support Vector Machines, Naive Bayes, Decision Tree, Random Forests and Neural Networks.
Unsupervised: Clustering and Dimensionality Reduction
Clustering: k-means clustering, Hierarchical clustering, mean shift clustering and density-based clustering.
Dimensionality Reduction: Principal Component Analysis (PCA)

Videos Also Watched For Better Understanding

Day 17

I took it really easy today. I started the Intermediate Machine Learning course on Kaggle.
I learnt about how to handle Missing values and Categorical variables. There are 3 ways to handle each.
Missing values: Drop columns with missing values, Impute a statistic value or Add a Bool column to show columns with imputations. I personally don't see the use of the third yet.
Categorical variables: Drop the columns involved, Label encoding (Assigning each unique value to a different integer) or One-Hot Encoding (Creating new columns for each categorical value).
In life, data is really messy (having lots of missing values) and also having with categorical variables. Knowing how to handle these when building ML models is key.

Day 18

Today, I completed the Kaggle course: Intermediate Machine Learning. There are tools to create better models and I learnt quite a few.
Learnt about Pipelines, Cross-validation, XGBoost (Gradient Boosting) and Data Leakage.
Pipelines makes your data preprocessing and modeling code organized. Not every data scientist uses this but I find it really helpful.
Cross-validation gives a more accurate measure of model quality but only suitable for small datasets.
XGBoost uses gradient descent method on the loss function to create a better model. Another ensemble method like Random Forests.
Data Leakage: This are sutle errors but with huge consequences on your model. There are two types: Target leakage and Train-test contamination.

Day 19

Today, I took things slow. I watched 2 videos in Ken Jee's Data Science project from scratch Youtube playlist
Learnt about the planning process from (https://www.youtube.com/watch?v=MpF9HENQjDo) where I can get ideas through:

Looking at the available data (on Google or Kaggle datasets) and building a project from what you find.
Starting with a problem I want then getting relevant data to solve it.

He recommends the latter as being fun but I'll start with the former.
Also learnt about the data collection process from (https://www.youtube.com/watch?v=GmW4F6MHqqs)
It's advised you create a Github repo for every project and use web scrapers like BeautifulSoup or Selenium to get relevant data.

Day 20

Today, I continued my "Big Data Analytics with Python - Day 9" course being offered by Utiva which I take every weekend. We handled Machine Learning today. It was a really good introduction to the subject matter.
I learnt that there are 4 kinds of Machine Learning as opposed to 2 I learnt earlier: Supervised, Unsupervised, Reinforcement learning and Deep learning. We also learnt about the basic terminologies in ML. We used visualizations to do better analysis on the data before building the model.
Going deeper, we did Linear regression and Logistic regression. I learnt about the 4 different assumptions to check for in linear regression models. We also did some Data Cleaning and Feature Engineering.
I also did some time series analysis with R. I modeled some named stochastic processes: White Noise Process, Random Walk Process and Moving Average Process. I studied their mean, variance, autocovariance and autocorrelation functions.

Day 21

Today, I had the last class of "Big Data Analytics with Python - Day 10" course being offered by Utiva which I take every weekend. We went deeper into Machine Learning today. It's been an amazing journey.
I learnt about KNN, Decision Trees, Random Forests and Support Vector Machines (SVM). I fitted models, made predictions and validated the models. I also improved the models using Grid Search. I was getting accuracy of 94%. I also created client data and got a prediction off it from the model I developed.
I've been seeing the Machine learning proper to be more fun. Really looking forward to when I'll start doing projects.

Day 22

Today, I watched 2 videos in Ken Jee's Data Science Project from Scratch Youtube playlist. It was almost 2 hours.
Watched a video on Data Cleaning (https://www.youtube.com/watch?v=fhi4dOhmW-g). The process is really long because data comes really messy. Python can do a whole lot.
Also watched a video on Exploratory Data Analysis (https://www.youtube.com/watch?v=QWgg4w1SpJ8). He used Jupyter notebook for this as opposed to Spyder IDE used for the previous steps. Visualizations in Jupyter are really informative.
Also learnt from Ken Jee in another video (https://www.youtube.com/watch?v=uic34RTaI-w) that "The One Thing" in Data Science is doing projects. It is the 20% that yields the 80% of the results. I'm focusing more on that when I'm done with this Data Science Project from Scratch Youtube playlist.

Day 23

Today, I watched the last 3 videos of Ken Jee's Data Science Project from Scratch Youtube playlist. I really learnt a lot.
Watched a video on Model Building (https://www.youtube.com/watch?v=7O4dpR9QMIM). It was really educative. I learnt that I should build at least 3 models and check for the best using scikit_learn error metrics.
Watched a video on Putting the Model into Production (https://www.youtube.com/watch?v=nUOh_lDMHOU). He used Flask API to do this process. This was really strange to me. I also feel this process is optional. Not so sure.
Then watched a video on Documenting your Work (https://www.youtube.com/watch?v=agHKuUoMwvY). He used Github for this and I saw the importance in getting a job. Since I'm hopeful about getting a Data Science role soon, I'll prioritize this process.
Finally listened to "Analysis Without Data" on the Not So Standard Deviations Podcast. They looked at Data Science from the point of academia and industry. Analyzing the best processes to learn. Great episode.

Day 24

Really hectic day. Watched 3 videos on getting a data science job

Python Programmer: https://youtu.be/X_N3zIIJyAk
Joma Tech: https://youtu.be/MfP-P8EHGBo
Ken Jee: https://youtu.be/UpaEjBOMNqs
So much emphasis on doing internship first. Really didn't think I'd have to go through that route but I'm now having a rethink.

Day 25

Today, I started the Kaggle course: Data Cleaning. It's a really necessary course for Data Scientists.
I learnt how to handle missing values. it's good to drop sometimes while sometimes it's a wrong decision. You could also fill them through some other means. It's necessary to understand your data before making these decisions.
Also learnt the difference between scaling (changing the range of the data) and normalization (changing the distribution of the data to a normal distribution). These are really necessary for some ML models like Support Vector Machines, Guassian Naive Bayes, amongst others
Really grateful to Ken Jee and the #66DaysOfData community for the NVIDIA Deep Learning Institute free credits I got today. I don't take it for granted. I'll be sharing my progress on that platform over here too.
I believe many more of these will come in the nearest future.
I'm having an event this week so I'm not pushing my learning as hard as I did previously. That should change by next week.

Day 26

Making a late post because it was a really busy day.
Today, I switched to R and learnt some things about data mining.
Learnt about Data partitioning, various prediction accuracy measures, cross validation and leave one out kind of cross-validation.
Practiced them on R

Day 27

Learnt more about building Machine Learning Pipelines today. I previously did a kaggle course on it but I decided to study further. This article is good: https://www.analyticsvidhya.com/blog/2020/01/build-your-first-machine-learning-pipeline-using-scikit-learn/
I strongly recommend.

Day 28

Today, I continued my Kaggle course on Data Cleaning.
Learnt about parsing dates today. Python could see columns containing dates as objects. It's important to change them and also visualize during the data cleaning process.
Having several date formats within the unparsed column could be a challenge but pandas has a way out though it takes time to do.

Day 29

Today, I completed the Kaggle course: Data Cleaning. Learnt about Character Encoding and Handling Inconsistent Data Entries.
Character encoding are the rules that guide the mapping from binary byte strings to readable text. The default encoding for Python codes is UTF-8. But some documents/datasets aren't encoded in it. This is a useful tool to have in one's arsenal. It may not be used often but it's good to be ready whenver encountered.
Data mostly comes messy (Proof from how you handle questionaires and surveys). Some preprocessing does the trick but sometimes the fuzzywuzzy package is needed. I personally find the name funny but it gets the job done most times.

Day 30

Today, I studied stationary stochastic processes using R.
Stationarity happens when the mean, variance and auto covariance functions are all independent of time. Working with stationary processes helps a lot as it makes the model prediction process much easier.
I also explored some white noise process, moving average process and random walk processes to check their behaviour and if they are stationary.

Day 31

Took things slow today. Listened to a Data Skeptic Podcast Episode on "Automatic Summarization". It was a really informative episode. We explored what makes a good summary? Is the data provided sufficient? Are the old methods we use to get text summaries really the best? I've found out that Data Science is more about asking the right questions.

Day 32

Today, I studied polynomial regression in R.
Sometimes when you plot a simple linear regression model, it looks curved in a quadratic way or otherwise. This is where ploynomial regression can be really helpful.
I also learnt about Lift charts. Very useful in decision making process. It orderly seperates your data into 10 groups and gives some helpful statistics about them. It can show the percentage of your response variable affected by a top grouping in your data.

Day 33

Today, I watched 2 StatQuest videos. Learnt about the Confusion Matrix. The matrix can be really confusing... Lol. The Confusion Matrix contains True Positives, True Negatives, False Positives and False Negatives. I also learnt about calculating Sensitivity and Specificity. Sensitivity is the ratio of the correct positive predictions made while specificity is the ratio of the correct negative predictions made. This concepts are important in ML models.

Day 34

Today, I got my hands busy with a project on controlling Employee Attrition.
Collecting the data in an xlsx format, I imported it and made some observations on it through Visualization. Seaborn really brought clarity on the loopholes the company I'm studying has. I was able to deduct 7 causes of the employee attrition from the plots. Going forward, I'll love to explore current employees who also have tendencies to leave and to provide recommendations for the company.
I also watched a YouTube video by Ken Jee (https://www.youtube.com/watch?v=yukdXV9LR48). It was on the kind of Data Science projects that lands a job. Biggest takeaway from the video was to: Prioritize collecting/scraping data, do a good feature engineering and finish every project I start even if I get stuck.

Day 35

Today, I continued the project on controlling Employee Attrition.
I visualized the current employees data and made 7 recommendations on how to avoid future employee attrition.

Day 36

Today, I watched 2 StatQuest videos. One on Cross-validation. I got more light on it. I watched another on K Nearest Neighbours. It's a really good classification algorithm.

Day 37

Today, I read this tweet by Prasoon Pratham. It is a beginner's guide on getting started with your first machine learning competition on kaggle. He really broke things down and recommended some YouTube videos to get better understanding of some ML concepts. Read great read/tutorial.

Day 38

Today, I read an article on "How to Identify a Clever Data Scientist". It was a great read. I learnt that Data Scientists need soft skills in addition to technical skills to really stand out. Some soft skills include story telling, really listening and asking the right questions. I need to develop a mixture of both technical and emotional skills.

Day 39

Today, I watched Machine Learning Foundations by Google Developers (https://www.youtube.com/watch?v=_Z9TRANg4c0). It was a really insightful video. I learnt the difference between software development and Machine learning. I also got exposure to Tensorflow and Neural Networks. There was a Google colab exercise to solidify the knowledge.

Day 40

Today, I continued the Machine Learning Foundations by Google Developers (https://youtu.be/j-35y1M9rRU). It was an introduction to computer vision. Great and well broken down video.

Day 41

Today, I practiced some exercises on Machine Learning Foundations by Google Developers: Computer vision. Still understanding how the neural network algorithm works. Resources: https://goo.gle/34cHkDk

Day 42

Today, I still went deeper into understanding Neural Networks. Watched a StatQuest video: https://youtu.be/CqOfi41LfDw The explanation was well broken down.

Day 43

Today, I studied time series in R specifically on modeling deterministic trends. I studied straight line regression, polynomial regression, and seasonal means model all using least squares estimation. I also did some analysis on residuals, checking their normality and independence.

Day 44

Today, I studied variable selection in R. When fitting a model, how many regressors are too much or too little or just right? I explored several stepwise methods for carrying this out.

Day 45

Today, I started a project on studying taxi-out time of airlines and how fuel consumption can be reduced. I focused on scraping the data today. There are so much to learn through doing projects. I have been studying since but project teaches better. Hopeful of the findings I'll make with this.

Day 46

Today, I focused on learning web scraping with selenium through some YouTube videos. Yesterday while extracting the data for the project I encountered a bug. So I'd continue with the project tomorrow.

Day 47

Today, I learnt about Urllib requests. I combined it with selenium to extract the data required. All this is needed in the project I'm working on now.

Day 48

Wow, just 18 days to go. It's been an awesome journey so far.

Today, I finalized the Data collection code and I'm currently extracting the data as I type this. It's going to take a while to be fully extracted because the data required for this project is large.

Day 49

Today, I watched a StatQuest video on Ridge regression. I feel I'll be applying some regularization regression models in this project so I'm getting equipped for it.

Day 50

Today, I watched a StatQuest video on Lasso regression. It's a type of regularization regression models. It is very similar to ridge regression but I find it better because it can eliminate useless variables/parameters in the model. I couldn't do so much today because of my upcoming Exams.

Day 51

Today, I watched a StatQuest video on Gradient descent.

Day 52

Today, I did some Data Cleaning on the extracted data for my project. I handled missing columns, I converted the time columns to datetime dtypes to easier use. Excited about the days ahead.

Day 53

Today, I watched a StatQuest video on pruning Regression trees using Cost complexity pruning. It helps in the reduction of overfitting of a model by penalizing the trees with larger nodes depending on the value of the tuning parameter chosen.

Day 54

Knowing I'll soon be transiting from Academia to Data Science, I equipped myself with this article by Dave Dale: https://towardsdatascience.com/5-things-academics-need-to-know-when-they-become-data-scientists-591d078e6ef6 My biggest takeaways are:

I have to drop that pride that academics have. In the industry, people are more interested in usefulness than smartness.
The best strategy isn't to apply something new learnt in a paper/research article. This usually involves a lot of time in applying it with no guarantees that it'd work out.
Learn to lean on people. In academia, the focus is usually on one person to publish papers and the likes but industry brings teams of people with diverse strengths to synergise.

Day 55

Today, I watched 2 YouTube videos by Data Professor on building DS/ML Web apps with Streamlit. I found the library really easy to use and deploy. It's one thing for a Data scientist to build a useful model and it's another thing for it to be made available to people to also use and benefit from. I took a break on my Taxi-time project because of the Exams I wrote this weekend. Should get back to it soon.

Day 56

Today, I reviewed some Statistics from the StatQuest YouTube channel. The guy really teaches so well. Visualized the difference between Ridge and Lasso Regression: https://youtu.be/Xm2C_gTAl8c

Day 57

Today, I continued with my project on Taxitime. Did some more data cleaning and EDA. I also tried doing visualizations but the data was really large so I plan on segmenting it and doing the visualizations that way.

Day 58

Today, I reviewed Time Series data in R in preparation for an Exam I have tomorrow.

Day 59

Today, I watched a video on Auto ML on StatQuest's channel. I'm really fascinated about how this tool would help Data Scientist and Machine Learning novices and experts produce better models. It also frees up time for Data Scientists to focus more on getting domain knowledge. Here's the link to the video: https://youtu.be/SEwxvjfxxmE

Day 60

Today, I did some visualizations on my Taxi-out time project. Let me give some background to it first.

Taxi-out time is the time between the actual pushback and wheels-off. You can see it as the total amount of time an aircraft spends moving in an airport.

In this project I analyzed flights departing from San Francisco International Airport, CA for 4 airlines: Delta Airlines, United Airlines, American Airlines and Southwest Airlines over a period of 10 months specifically from January to October 2019.

This research is so imporant because an aircraft burns a lot of fuel per minute when taxi-ing. For example, the Boeing 737 burns about 11kg of fuel per minute when taxi-ing. From the visualizations, we can see 5 destination airports with the highest amount of taxi-out time from the origin airport for each airline. The question behind this visualization is: Does aircraft activity in the destination airport affect taxi-out time in origin airport? Too many arriving and departing flights could make the aircraft taxi more at the origin airport. We can observe that for Delta Airlines, BOS, LAX and JFK have an average taxi-out time of over 21 minutes (About 231 kg of fuel burnt). For United Airlines, BHM and GSP have an average taxi-out time of over 40 minutes (About 440 kg of fuel burnt). For American Airlines, JFK, LAX, PHL, MIA and CLT have an average taxi-out time of over 21 minutes (About 231 kg of fuel burnt). For Southwest Airlines, STL, LAX, and MKE have an average taxi-out time of over 18 minutes (About 198 kg of fuel burnt).

Day 61

Today, I dug deeper into K Nearest Neighbours applying it to both Regression and Classification problems. I did this in R.

The algorithm is really a lazy learner. It doesn't form a model but can be used when you have the data to predict.

Day 62

Today, I reviewed some Stats from the StatQuest YouTube channel. Saw a video on Support Vector Machines. Really educative: https://youtu.be/efR1C6CvhmE

Day 63 - 64

I watched some videos on Data Professor's YouTube channel on Virtual Internships. Watched that of KPMG, Deloitte and General Electric. I'll boost my experience with them soon. Link to the playlist: https://youtube.com/playlist?list=PLtqF5YXg7GLlJjIM8EhbDhMHtSYcPnVvH

Day 65

Today, I watched a video on Data Professor's YouTube channel on whether a PhD is required to be a Data Scientist or not. Link to the video here: https://youtu.be/ulRPiEJRyFQ

Day 66

Today, I researched on and chose 6 Virtual internships I would do as I transition into the next phase of my learning experience.

So happy to have followed the journey through for the past 66 days.

It has really been a great learning experience and I have learnt a lot.

More importantly, I have developed the habit of learning Data Science every day no matter how small.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation