The company at which I was working had a number of units of equipment each with an expected service life of 120K hours. During the service life, a particular wear behaviour was expected to increase in frequency and severity - as illustrated by the above exponential curve - to a point where regular maintenance no longer made sense, at which point the units should be replaced at a capital outlay of approximately $36M. That replacement time was approaching.
Years before, the engineers had made customization recommendations, some of which were implemented, to curb that wear behaviour; however, with no tools to test the hypothesis, they could not empirically know that the wear behaviour had been mitigated and, in the absence of such clarity, the appropriate action was to go ahead and order replacement units.
Before proceeding, they asked me to take a look at the data they had (stored not only in functionally limited spreadsheets, but in PDFs and Word documents) again and, at least, confirm that indeed, the existing wear behaviour was as expected, that time had accelerated the wear frequency and severity, and that replacement was the best option.
Did I mention that most of the data was in documents?
The Method - Applying Fundamentals
I took the poorly laid out spreadsheets they had, optimized the layout/design, built some summary/analysis functionality and, with help from an engineering coop student, back-filled some five years of inspections data into the tool and broke down the data. And, all of this was done with no need for any additional VBA programming muscle - just good old Microsoft Excel - albeit, gussied up a bit.
When I say “optimized the layout/design”, the first matter was to get away from the idea that data should be input into a form that is also the informational output. In other words, the old spreadsheet was built like a report, and the inspectors entered data that they had processed in their mind before entering. The simpler approach is to enter the data into a form that is optimized for more automated analysis - the computer should do the processing. Tools for analysing the data and then formatting for sharing are separate from data entry.
Another matter, with the data itself, was to ensure that each inspection record was a row, and each field of data in an inspection was a column. This "relational model" does a few important things for us:
First, it makes it a whole lot easier for the strength of the tool to do more work for us, faster and more accurately. Instead of Inspectors having to do the math in their heads and then enter the result onto a report, we make the spreadsheet automatically do things like tally inspections by period of time, or by unit, or by wear type, etc. This is because this way of laying out the data imposes structure to the data; that is, a "field" of data is consistent. We human beings tend to put a lot of different bits of data into one coherent thought string, but computers are not very good at parsing apart too many ideas in a single piece of data. In this relational model, a column with a date is always only a date (written as a number), or empty; a column with a unit ID is always just a unit ID, etc.
There are also several advantages to the increased scalability of the data in this approach:
consistency in processing, as the sheer number of records accumulates over time. The analysis tools that, for example, count the number of inspections by month, can go through 243 rows, 2,438 rows, 24,387 rows or 243, 875 rows equally, with no change to the formulas required;
if the data sets need to be cross-referenced against other tables of data from other departments, having to navigate different data structures can be more resource-intensive (it will take more time and money). If the data fields are more ideally structured, it makes it much easier, for example, to associate an inspection record with shift personnel data, by date, or by location, or by unit, or by shop, etc.;
reporting is also easier, as, again, we can make the tools do a lot of the work in creating report tables, charts, graphs;
and, if/when it is time to migrate that data to other platforms, for example, updating to a regional server database, the structured data is ready to be handled by any enterprise database platform.
Oh, and, as for all that data that was trapped in PDF and Word documents, sure, they could have been scanned and, using optical character recognition technology, the data could have been scraped. But, the scope, resources, budget, timeline and technical constraints relegated that approach as a non-option. But, beyond the objective data, subjective nature of assessing a report, especially reports that includes images, required human expertise and judgment - our engineering coop student would get value for his work term from combing through the document records, immersing in the data and getting a feel for the material sciences and properties involved. Yes, it did take some time to go through five years of records and manually enter the data into the spreadsheet, but the results would prove that time to be exponentially worthwhile.
What the data offered was actually illustrated with the opposite, a logarithmic curve - the modifications, along with the maintenance approach, had indeed effectively mitigated time as a factor accelerating wear behaviour.
The engineers couldn’t believe it. Of course, we turned over our data for their keen scrutiny and, once they were satisfied that the results were sound, they were empowered by this data-driven clarity to develop a business case to extend the service life 25% (another 30K hours), and defer $36M off the then-current budget by foregoing replacements they did not need to make.
This initiative optimized the value of decisions they’d made years ago, decisions which were supposed to pay off; but they almost missed the opportunity to realize the value of those decisions.
Creating, and curating quality data assets requires investment.
Cutting costs by poorly curating data costs more money and increases risk.
Optimally curated data gives you a better chance to make more money.
Companies in every industry are sitting on data assets they paid good money to collect (in terms of employee expertise, machines, devices and equipment to capture and record, etc.) but from which they're not realizing the value opportunities these assets represent.
It's a shame, because the cost to develop better curation tools is relatively small compared to the improvement in company performance that could be realized. Put another way, for the six months it took to scope, develop, test, deploy, back-fill with historical data and then analyse, our fee certainly didn't amount to even 1% of the costs saved. Clearly, the ROI is exponentially worth the investment to go get it.