You can imagine the boardroom conversation that took place on that temperate February morning in Sunnyvale, California, as Yahoo’s executives got around the table to discuss the takeover deal with Verizon.
No transcript unfortunately exists of this frank exchange of views, but you can bet the phrases “reduced offer”, “$350mn less than before”, and “colossal data breach” cropped up on more than one occasion.
The recent news that Verizon have substantially lowered their offer for the pioneering tech firm following two large scale data breaches has reinforced how vital robust levels of data security are in the modern business landscape. Data-orientation is great, and the more data your business can store and wield, the better, but leave that data vulnerable, and there is a heavy cost to pay.
But hacks and breaches are not the only risk attached to Big Data. As we plough ever onwards into masses of new data we encounter each and every day, we encounter new hazards at the same rate we encounter new benefits. These hazards must be managed, and this means altering the best practice rulebook.
Staying On Guard against Cyber Criminals
Let’s start at the beginning; cyber crime. Cyber criminals are no longer the stuff of science fiction novels, nor are they the stereotypical bored teenage Robin Hoods in their bedrooms, siphoning money from inflated bank accounts. In real life, cyber crime can come from many sources, including organized criminal gangs and even government-backed intelligence programs.
One of the most terrifying features of cyber crime is how rapidly it evolves and develops. To be a professional hacker you have to be smart – and I mean smart; you have to understand the protocols you come up against and know how to defeat them.
This means that those on the defensive side of things – that’s us, by the way – need to shift the goalposts a bit. It is no longer enough to employ teams of analysts keeping a constant watch on defensive infrastructure; these efforts must be supported by something more robust.
I’ve discussed the shift towards artificial intelligence in terms of cyber security – user and entity behavior analytics, in particular – but next-gen defenses will be even more sophisticated. We can expect that AI will play its part as firms look to strengthen their existing security protocols.
Keeping data safe is one thing, but using it effectively is another. Big Data, it appears, has a weight problem; and no amount of exercise or healthy eating is going to have an impact on its expanding waistline. This is great, of course, as there is no such thing as too much data, but it causes problems of its own.
Siloing data and ring-fencing it for use within specific departments was formerly the accepted way to manage vast amounts of information, but this is now out of date. Partitioning and compartmentalizing data in this way just makes it difficult to wield. For example, if one department needs to cross-reference their data with that of another division within an organization, they cannot. By time the two datasets are connected and analyzed side by side, the moment has passed.
Increasingly developed cloud storage structures go some way to remedying this issue. These structures provide fluid, remote access to data, enabling different departments to work collaboratively across a broad range of projects.
With the best data available, and with the very best intentions, it is still possible to make mistakes. In fact, it’s not only possible; it’s easy. Incomplete data, skewed survey results or info capture methods, unreliable sources, outmoded data sets; all of these factors can create wild discrepancies and inaccuracies when it comes down to the final reports.
And, as you well know, a little slip up here and there in the data, extrapolated out over hundreds or even thousands of reports, only ends up getting magnified. What began as a tiny error can quickly become catastrophic.
So what’s the answer? The knee jerk response is to return to artificial intelligence, eliminating any human error from the process. However, remember that AI is not born from nowhere; it must be developed. Get the algorithm wrong or make an error in the programming, and the whole system comes crashing down.
The team at SimplyStatistics.org highlighted this back in 2014. They used the example of Google’s forecasting of influenza outbreaks, based on search results from specific locations. Unfortunately, the forecasts tended to be a little off. Without getting too deep into the applications of statistical theory, the reason for these wild results was found to be an error during the modeling process, which led to the erroneous visualization and interpretation of otherwise sound data.
In this case, the algorithm did not fail, but the example shows how vulnerable such interpretive structure are to failure, and how much of a risk this poses to business. Future developments in best practice will need to combat this; recognizing and defeating data fidelity hazards before they can strike.
Playing Data Capture at its Own Game
The secret is out about Big Data. It is no longer a well-guarded piece of savvy insider knowledge within the business community that organizations use data insight to make big decisions; the whole world knows it, and this is a risk.
What’s the harm in this? I hear you ask. There is no harm in it at all – the more people who join the data party the merrier – but it does leave datasets open to manipulation. For example, when a customer volunteers information to an insurance company, they know that the insurance company will utilize this data to gain a better understanding of their clients, and that this understanding will effect premium costs and provisions in the future. So, do they tell the truth? No, they submit the information that they believe the organization wants to hear.
This is natural human behavior, so how can we defend against this? While this sort of ‘gaming of the system’, as Project Syndicate describe it , can never be wholly stamped out, data analysts can take measures to improve their response to it.
Protocols which can recognize possibly fraudulent entries in datasets, and either query or discount them, will also be vital in the fight against corrupted, useless data. We will also see organizations walking very dubious moral tightropes as they seek to mine as much data as possible from unsuspecting sources.
The landscape of Big Data best practice just got a lot more diverse; it is up to us to make sure it stays on the right side of the ethical line.
Image via Pixabay