In my own workplace the spreadsheet jockeys outnumber the scientists by at least a 10 to 1 margin. I have little doubt that other technical businesses have similar or even much larger ratios. What I call spreadsheet jockeys they call business analysts, data “scientists”, financial analysts, economists, and business managers. Throughout my career, across multiple companies and businesses I have seen this ratio continue to grow ever larger, and there seems no end in site. The compensation paid to persons in these positions has also grown apace with the expansion in numbers. Meanwhile I have seen the ranks of my fellow scientists continue to decline and our compensation remain stagnant or decline. Those facts are depressing enough but the fact that the spreadsheet jocks typically spend 4–10 years less time in college/graduate school angers and worries me in equal measure.
What worries me about the current situation has nothing to do with the quality of education or the people pursuing careers in these fields. I cannot begrudge anyone following a path different than the one I chose, especially when I look around at my shitty apartment or write yet another big check to continue paying down my remaining pile of debt from school. Rather what worries me is the focus of the jobs that these persons fill. In one way or another they are all professional data analysts. They take existing data sets and do something with them. The data they select and the things they do with that data are wildly different, but in the end they all boil down to essentially that one activity. On its own their is nothing wrong with that. It is a highly value adding activity which is one of the reasons people who are good at it are so highly paid. Moreover, it seems as if the amount of data available to analyze is endless and ever growing. In a sense, it is. However, in another sense it is limited. It will always be limited by what we are able to collect and what we are able to generate through experimentation.
Data generation and data collection (experimentation) are a part of the scientific method along with hypothesis generation, and data analysis. Data analysis is however only a very small, really very tiny part of the scientific method. It could be argued it is in fact the smallest, least consequential part. Of course this is highly debatable, after all one cannot generate hypotheses without some analysis of data with which to hypothesize. However, it is inarguable that it is only a part of the scientific method, not the whole. This is one reason why data “science” is not science and its practitioners not scientists as I have written about many times. However, it is also a potentially very large problem. If current trends continue the number of people able to generate and collect analyzable data (able to design rigorous and repeatable experiments) may someday shrink to an unsustainable level. Where then will the data come from for the ever growing population of data analytics professionals to analyze? Nature will provide is the answer you might consider. People will always do the things people do, and thus by their very existence in the world they will generate a near infinite amount of data. That may be true, but without some of those people having the capability to design the correct experiments to capture it, all that data is about as valuable and useful as a liter of air. To a drowning man a liter of air may seem a fortune, but for everyone else it is worthless.