If you are a business person with responsibilities for management and technology and think that a “Poisson distribution” is delivering fish in Paris, “Bayesian inference” means talking about aspirin without mentioning it explicitly, and MANOVA should be followed by “board,” you may be a bit challenged when it comes to selecting the right tools for performing analytics within your organization. Nonetheless, statistical methods and data analytics are an important part of managerial decision making; they are having a resurgence in the global auditing community and growing in interest in other channels.
Whether your goal is evaluating what has already happened or forecasting what might be, the right software in the hands of the right people can be a vital tool. In this column, I begin the first of two columns on the world of analytics and focus on two free software products—pandas and R—that are quickly displacing, or at least making a major place for themselves among, the older world of commercial statistical software.
Data analytics is big. In the audit world, as a proxy for the broader business market, the American Institute of Certified Public Accountants, CPA Canada and the International Audit and Assurance Standards Board have all recently and separately established groups to consider the importance of data analytics in auditing and accounting, and the academic community, represented by the American Accounting Association, has begun an in-depth project to consider the role of Big Data in accounting and audit.
Being a good steward of the new availability of data and tools for analytics is something those of us with a fiduciary responsibility are being called to be. And why not? Statistics can support conclusions and arguments, especially for points of view developed through instinct or other methods. Analytical results can be useful in making connections and relationships between variables, and understanding if the relationship is one of “cause and effect” or just linked in some other way.
These approaches can be used to promote quality; there’s that old adage, “You can’t manage what you can’t measure.” (Although one might point out that you often can’t manage what you only measure as your management strategy when it comes to people.) And statistics and analytics can help you see the big picture where you might otherwise get lost in the detail … or where there is so much detail, you simply can’t see the forest for the trees. Visualization is very powerful, but sometimes your senses can only take so much in.
The battle of managing by gut and instinct versus analytical decision-making is a long standing one and at one time was described as the difference of approach at as lofty a level as the business schools of Harvard and the University of Chicago. (Our own University of Rochester’s Simon School, my alma mater, at the time fit squarely in the University of Chicago camp; the “instinct” proponents wagging that U of R grads thought “every problem could be solved with a complex differential equation.”) The application of mathematical and statistical methods to sports, finance, enterprise business intelligence and other areas of opportunity and risk isn’t new, but the practice continues to join the mainstream.
Finding some way to leverage the expanding mass of open data (the freely available content published by governments and others) and Big Data (everything within reach, no matter how complex to use or structured, largely considered for predictive analytics) calls for analytical tools to extract and act on the content for data modelling and statistical analysis.
I have seldom, over the last 26 years of writing this column, been so challenged on discussing a topic that is so intimately connected to business management and yet so difficult to discuss without technical terms—terms like “linear regression” or “Bayesian inference” or “multivariate analysis.” There is so little to tie it to that is familiar; there’s no Microsoft Office module named “Stats.”
There is statistical power to Microsoft Excel (and, certainly, Excel is used by analysts), but it falls short of the more dedicated products for a number of reasons, from the size of the data it can easily work with to automation to expandability to verifiability and sharing ideas and analytics with others. It is likened to an analytical “screwdriver” compared to true analytical tools’ “power drill,” a Honda Accord to a Formula 1 race car. Microsoft and other developers have been working to bolt on additional power—Microsoft offers Power Query, Power Pivot, Power View, Power Map—and yet issues like verifiability and reproducibility go largely unanswered.
The best-known products in this space are known to research professors and Ph.D. students everywhere. For years, commercial products like SAS (sas.com) and SPSS (purchased by IBM in 2009) were the mainstay of those doing analytical work and statistical analysis. The products were available to professors and students for very little money or for free; however, once the student graduated and knew the product well and wished to translate those skills into business, the business license costs were much higher. Matlab (Mathworks.com) and Stata (Stata.com) are also major players in commercial space.
Recently, analysts are looking to two free tools: pandas (http://pandas.pydata.org/) and R (https://www.r-project.org/). These products are expandable, heavily supported, powerful, and the user base for R is now believed to match or exceed that of SPSS. In my next column, I will discuss pandas and R and how they can be used for business.
So, if you think “MANCOVA” is a male sanctuary, the last bastion of masculinity, judging whether R or pandas is right for your organization may be difficult. But good decision making, better support for bargaining and better quality may be the result.
Eric E. Cohen, CPA, of PwC, is spending his time reinventing how accounting information is shared, with XBRL International.
8/28/15 (c) 2015 Rochester Business Journal. To obtain permission to reprint this article, call 585-546-8303 or email email@example.com.