Data science touches all of our lives, whether you're a practitioner or the unwitting recipient of data scientists' efforts. Assuming you're in the former group, or at least aspire to be, here are 11 blogs that will help you revel in all things data science. Most of them are written by, and for, those who are well into their data science careers. But there's some chestnuts here for the newbie as well, explaining what all the fuss is about in layman's terms.
Kaggle Winner's Blog
Kaggle Winner's Blog, also known as No Free Hunch, might be the most fun you can have with a data science blog. The blog and the site are run by Kaggle, which offers a customizable Jupyter Notebooks environment, including free access to GPUs and a repository of community published data and code.
Kaggle Winner's Blog also offers various competitions for data scientists. In these competition, data scientists must create the best model for data sets. The challenges are put forth by organizations such as Microsoft and the National Football League and offer cash prizes.
One blog features an article about Sanghoon Kim (otherwise known as, Limerobot), the third-place winner of Booz Allen Hamilton's 2019 Data Science Bowl, a competition that focuses on social good. The South Korean data scientist won for his identification of factors to help measure how young children learn. The competition has attracted more than 50,000 competitors.
With more than nine years under its belt and 338,000 followers, r/DataScience or Data Science Reddit is considered one of the most popular data science communities. Members post questions such as, "Am I shooting myself in the foot for listing that my master's degree has a specialization in data science?" which drew 70 responses in the first 14 hours after member u/beepboopdata posted it. Another question is, "Why is R so valuable to some employers if you can literally do all of the same things in Python?" which drew more than 300 responses six days after it was asked by reader u/willcostiganjr.
Based in The Hague, Netherlands, Datafloq is as much a conduit to data science resources as anything else. Founder Mark Van Rijmenam developed a content rating tool called Mavin, which rates the member-submitted blogs. Van Rijmenam launched Mavin with Thomas Modeneis, Patrick Joore and George Visniuc.
The blog includes entries, such as:
Microsoft hosts Revolutions, a blog that covers news and information of interest to R community members. Originally known as Revolution Analytics, Microsoft acquired the blog in 2015. As of December 2020, David Smith, the R community leader at Microsoft, currently is the blog editor.
The blog posts are grouped into about 25 categories, such as "applications," which includes posts showcasing interesting applications of R to real-world problems; "predictive analytics," which includes posts about predictive analytics, data mining and machine learning; and "developer tips," with information for package authors and developers of R.
Aylien uses AI to help organizations and developers to collect, analyze and understand large amounts of human-generated content. The heart of its services is the use of natural language processing (NLP) to sift through thousands of news sites and parse them according to your needs.
Topics include "sentiment analysis," which is the use of machine learning and NLP to analyze text. In one blog, Eoin Kilbride, a product specialist with Aylien, looks at sentiment analysis to assess whether it's "being discussed in a positive, neutral or negative light."
The KDnuggets site explores AI, analytics, big data, data mining, data science and machine learning. Matthew Mayo and Gregory Piatetsky-Shapiro edit the site. Mayo is a machine learning researcher and Piatetsky-Shapiro is a co-founder of the Knowledge Discovery and Data Mining conference and a co-founder and past chair of ACM SIGKDD, a professional association for data mining and data science.
KDnuggets is one of the better-known data science sites and includes links to data sets, tutorials and webinars. Blogs include postings by Nicole Janeway Bills, a data scientist at Atlas Research and federal government consultant, who guides her readers past five common mistakes in the data science planning project, and Frank Fineis, lead data scientist at Avatria, who writes about the business reasons of deep learning models.
From recurrent neural networks to feature engineering, Subconscious Musings is a data science blog from vendor SAS that features the perspectives of SAS data scientists as they share the technical methods used to solve many of the challenging problems facing organizations today. The blog is detailed and is popular among those looking for a deeper dive into NLP, neural networks, AI and other related topics.
Blog authors include Brandon Reese, senior machine learning developer in Scientific Computing R&D, who demonstrates how to represent data as a network, run standard network science algorithms, and interpret the results as well as Susan Kahler, a global product marketing manager for AI at SAS who has her Ph.D. in human factors and ergonomics. She writes about having used analytics to quantify and compare mental models of how humans learn complex operations.
Data Science Central
Data Science Central, part of the TechTarget network, is the industry's online resource for big data practitioners. From analytics to data integration to visualization, Data Science Central provides a community experience. The site offers a wealth of information, from webinars to free books to forums. The site covers topics such as analytics, business intelligence and Hadoop.
The blog portion of the site is offers about 24 posts per week. Authors include Vincent Granville, executive data scientist and co-founder of Data Science Central. Granville was among the finalists at the Wharton School Business Plan Competition and at the Belgian Mathematical Olympiads, and he has published 40 papers in statistical journals. He also created the first IoT platform to automate growth and content generation for digital publishers, using a system of APIs for machine-to-machine communications, involving Hootsuite, Twitter, and Google Analytics.
Data Plus Science
Jeffrey A. Shaffer, chief operating officer and vice president of information technology and analytics at Unifund and Recovery Decision Science runs the Data Plus Science blog. If you're interested in data visualization, Tableau, or, more generally, data mining. Data Plus Science is worth a look.
While at Unifund, Shaffer helped create and develop its BI platform. He holds a master's degree in management from the University of Cincinnati and an MBA from Xavier University.
Data Science 101
For those considering a career in data science or simply want to know more about the subject, Data Science 101 is a great place to start. One of the oldest data science blogs, it was created by Ryan Swanstrom, who earned his doctorate in computational science and statistics and his master's in computer security.
Blog entries include:
- "How Deep Neural Networks Work," by Brandon Rohrer, an expert in neural networks and deep learning
- "A New Approach to Drug Discovery," by Daphne Koller, former Stanford Professor, co-founder of Coursera, and founder of insitro
- "The Future of AI and Machine Learning," by Hilary Mason, founder of Fast Forward Labs
Consider Algobeans the prerequisite course before moving on to Data Science 101. Written by Annalyn Ng, a senior data scientist at Amazon Web Services, and Kenneth Soo, who holds an master's in statistics from Stanford University, it offers concise explanations of core concepts without dragging mathematics into it. If you're light-years beyond the basics, it's still a very useful site to help explain to others (i.e., laymen) what exactly you do for a living. It's also an important site for everyday citizens, given that their lives are so influenced by big data.
For example, one post shows how kernel density plots can help make sense of shape and distribution data in a two-dimensional space. This is explained with the use of a random forest predictor to all the shots made by Liverpool Football Club in the 2017-18 English Premier League.