You should also start dabbling in RShiny and automated reports as these will help you in actual jobs where you need to present idea mockups and standardizing weekly/ daily reports.A starter example is the sentiment analysis I did during the Rio Olympics supporting Team USA. You might be limited by the daily pull limits on the free tier, so check if you need 2 accounts and aggregate data over a couple days or even a week. For sentiment analysis, nothing beats Twitter data, so get the API keys and start pulling data on a topic of interest.This is also the stage when you probably feel comfortable enough to start applying for roles, so building unique projects are key. Now you can start looking at non-tabular data like NLP sentiment analysis, image classification, API data pulls and even dataset mashup.Although, if you can win a competition even better! You dont need to be in the top 1%, even being in the Top 30-40% is good enough. At this stage simple competitions should be easy for you.Can you automate a simple report, or create a formatted Excel or pdf chart using only R programming? Sample code here. Automated reporting: Go for end-to-end reporting.How do you compare against the baseline sample submission? Against the leaderboard? Simple predictions: Apply any algorithms you know on the Google analytics revenue predictor dataset.Can your insights also be converted into an eye-catching Infographic? Can you recreate this? Survey analysis: Pick up a survey dataset like the Stack overflow developer survey and complete a thorough EDA – men vs women, age and salary correlation, cities with highest salary after factoring in currency differences and cost of living.Experimenting will help you figure out why. Sometimes simple decision trees work better than complex Bayesian logic or Xgboost. I prefer competition datasets since you can easily see how your score moves up or down. If you applied those algorithms but did not get the same result, check why there was a mismatch. Next, look at the kernels with decent leaderboard score and replicate them.This is the fastest way to understand why some algorithms work on numerical target variables versus categorical versus time series. Then apply every algorithm you can look up and see how it works on the dataset. Initially, run the sample submission to establish a baseline score on the leaderboard. Rather than Titanic, I actually prefer the Housing Prices Dataset. Once you are comfortable, you can move on to machine learning algorithms.This will not only teach you the code syntax, but also how to approach a new dataset and slice/dice it to identify meaningful patterns before any analysis can begin.One excellent example is the Zillow EDA by Philipp Spachtholz. Try to recreate it on your own, read through and understand the hows and whys of the code. Pick up any competition dataset on and look at the highest voted EDA script. At this stage, just try to write simple scripts in R that can pull data, clean it up and calculate mean/median and create basic exploratory graphs. If you are just starting out, and are not very comfortable with even syntax, your main aim is to learn how to code along with DataScience concepts.However, feel free to implement these ideas in Python, too! Since I prefer R over Python, all the project lists in this post will be coded in R. This book will help you reduce your job search time and quickly start a career in analytics. On that note, if you are already looking for a job, or about to do so, do take a look at my book “ DataScience Jobs“, available on Amazon. In this post though, I am classifying projects based on skill level along with sample ideas for DIY projects that you can attempt on your own. I’ve also listed 50+ sources for free datasets in this blogpost. You can find many interesting projects on the “Projects” page of my website JourneyofAnalytics.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |