New Year's Data Science Resolution Update
It's been two weeks since our first New Year's Data Science Resolution event, and the three teams are well on their way to improving the world through data science. Team Avivo, which is working to improve a chemical dependency treatment program, has gathered data about the program, including admission and discharge counts, and has also incorporated demographic data. Team Thunder Lizards, which is working to improve fundraising at the Science Museum of Minnesota, is still working on understanding the data dictionary and the exact questions that would help the museum. Tonight's first presentation was from Team Real Estate, which is building predictive models of Twin Cities home prices in collaboration with a local real estate agent.
John Hogue, who is a Lead Data Scientist at General Mills, presented on behalf of Team Real Estate. He showed his general process for getting started on a data science project using real MLS data in Python with Pandas. His presentation included:
Exploratory analysis: using the pandas-profiling package to get a simple overview of the data to find potential problems like null values and collinearity
Data cleaning using pandas commands
transforming features to be normally distributed
splitting categories into one-hot columns
binning values to eliminate outliers
Supplementing data using open APIs and HTML scraping
John’s full presentation is viewable as a Jupyter notebook here.
Next we had Abhishek Roy, a data science consultant from Slalom, present on behalf of Team Avivo. Abhishek used many of the same techniques, but using R rather than Python. We’ll hear more from Team Avivo next time.
As always, please join our Slack group to participate in the project. (Email email@example.com for an invitation to the Slack group).