Starting a Data Project

Data Analytics is all the rage. Managers read in the popular press about machine learning, deep learning, artificial intelligence, etc. and think, “we need to do that, too, we’re missing out” in a kind of “keep up with the Joneses” way. So a project is launched because “we must have some data we can extract value from”. But there are some key steps to starting such a project that will prevent a lot of wasted effort and time.

Know the Who and the Why

First, get clear on who you are leveraging the data for? Is is for internal customers or external customers? Who will make use of the insights gained from analyzing the data? C-suite managers or plant operators? If you come up with a fancy visualization or algorithm for the users, will they use it? Will they maintain it? Probably only if there is value for them since it will still require effort on their part to make decisions or take other actions based on the knowledge gained from the data.

What is the problem to be solved for these people? Is it to eliminate current pain points in their work, or it is to create new opportunities for new business or brand new ways of doing work? Do they need an answer today, next week, next month or next year? What is the business potential for the outcome if you are successful, and how does this project compare to other savings or business growth projects? Laying this groundwork will enable you to know how much to spend on the project both in terms of collecting the data, analyzing it, and deploying a solution based on the data/algorithms

What Data is Really Needed Versus What Do You Have

Knowing the problem to be solved, then it’s time to evaluate what data you have available and how that compares to what you will really need. If you are analyzing a manufacturing process, do you have sensors for all the critical process parameters that affect the operation in terms of process reliability and product quality? Are these sensors collecting data at time intervals that will be useful for understanding physical phenomena of interest? If you are studying glaciers, maybe collecting data on a yearly interval is fine, but if you are collecting data on chemical reactions, maybe milliseconds or microsecond frequency is necessary. Are the data accessible to data scientists and engineers who can make sense of it?

Do you have the right people to make sense of the data?

Usually having only subject matter experts (SME’s) with no data science background is less than optimal because they may be biased by their expertise, but having only data scientists with no subject matter experts may result in nonsensical conclusions because of a lack of knowledge or context about the data, how it was collected, and the meaning of the tags or what was going on in the collection of the data. Pairing up SME’s with data scientists and statisticians often results in a better outcome.

Do you have the right tools and is the organization aligned to using them?

There are a plethora of data analysis and visualization tools out there from open source to commercial data analytics software. In large organizations there may be “camps” of people who favor one tool over another. When embarking on a data project, do you know the landscape of these tools and their proponents? It can be disappointing to develop a solution using one tool only to find out the receiving customer isn’t aligned to using that tool. Discussing this ahead of time with the client can ensure that what gets co-developed will live on.