Web Scraping and Data Analysis via Python

I recently completed a project analyzing movie revenues against factors such as genre, release date, and viewer rating. A project for a beginner data scientist, and I enjoyed it. But as I was gathering the data, I grew more curious. This developed into a project of its own. I learn a ton, especially about web scraping, as I did this, so I’m hoping somebody else may learn by reading about it.

What other factors could affect a movie’s reception?

The problem with answering such a vague question is the need to gather data. Even just coming up with a list of what data to use is a daunting task, much less actually gathering the data itself. After spending too long reading about movie data — and pausing this idea to complete the original project — I hit upon something interesting: https://en.wikipedia.org/wiki/Category:Films_set_in_the_United_States_by_state. …


https://github.com/MullerAC/tanzanian-water-well-classification

Overview

Tanzania is a developing country that struggles to get clean water to its population of 57 million as part of an ongoing competition at DrivenData, this projects goal was to predict the functionality of water wells given data provided by Taarifa and the Tanzanian Ministry of Water.

This is a ternary classification problem: all points are either functional, nonfunctional, or functional and in need of repair. The data has 39 independent variables relating to the management, use, and location of the pumps. …


For my course in Data Science, I performed a linear regression model on the sales of houses in King County. Data for sales in 2014 and 2015 was provided, and Python was used to do the modeling. This is a rundown of my first linear regression modeling project. For the full code, see the Github repo: https://github.com/MullerAC/king-county-house-sales

Business Case

It is important to establish a business case when starting a project like this. This is to create a goal and help move towards a meaningful conclusion. My last blog post’s project didn’t have any real goal, so it just stopped when I no longer felt curious. It’s better to start with a firm goal in mind, and build towards that. …

About

Andrew Muller

Student of Data Science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store