Blog article
See all stories »

Too much info: you may just have all the data you need

"This was not a failure to collect intelligence, it was a failure to integrate and understand the intelligence that we already had." NYTimes quoting President Obama after his meeting with national security advisers about a terror plot to bring down a commercial jetliner on Christmas Day. (Jan 6th 2010)

Going to the movies with friends from the intelligence community is never a cheerful experience. Spending two hours in a conspiracy movie with people who sometimes while seeing a (seemingly) absurdly powerful data collection device say “ah, I know this system”, will make you a firm believer in conspiracy theories or at least a more paranoid individual. But even the most tech savvy and well informed of those people talk like Pres. Obama in that quote above – it’s not lack of data, it’s our inability to process it that limits us. Maybe project ECHELON really stores all of our communication – but what super computer and what sophisticated algorithms can process and identify all of the world’s pictures, plethora of dialects in written natural languages and voice calls? You know what? If you know the answer, I’m not sure I want to know.

Estimations of intelligence units’ capabilities aside, your average merchant or payment service is much more limited (and, to be fair, faced with a less complicated, or should I say critical problem). Between your transactions, industry black lists, account history, mailing lists with bad actor data and various tools offered in the open market, there’s a good chance of losing the ability to reconcile without a dedicated, expert team of analysts and developers that understand automation. But being able to automate isn’t the only challenge with data. Trying to know “everything”, you’re bound to trip over some problems.

Common pitfalls in data source acquisition

First, you have to get the data. Many raw data sources out there on the web are pretty hard to acquire; some are not priced correctly for scale, some require data sharing as a prerequisite (growing their database, but giving away your customers’ data), and some just won’t pass legal because they were attained in shady ways. Many times, because of the above, it becomes extremely difficult to justify the purchase of a new data sources. It takes very complex analysis to show how a data source can move your revenue dial and that its ROI is worth the risk. All in all, data source bizdev is a potential nightmare unless you are air tight on what you need, when you need it and what’s it worth for you.

After you get the data, you need to store it somewhere, and storage space and security are yet another challenge. There’s a limit to the volume of data you can save on your servers, and scaling such a system is no simple or cheap business. “So what”, you say, “I’ll put it all in the cloud” (very hip these days to put stuff in the cloud). Wait – isn’t that exactly the type of reckless use of Personally Identifiable Information (PII) that gets you data breaches? To deal with sensitive data in the payments space we have compliance and information security standards. Are you going to be PCI compliant, for example? A good question that must be answered. Right now the answer is no: clouds are public, shared systems that are hard to secure properly against fraudsters and hackers; if you want your cloud based system to be compliant, you need to give up your PII by receiving payments through a cloud-based payments system – which basically means losing data (having someone else collect your customers’ payment info), not gaining it. Once the field settles, in a couple of years, cloud computing for vast payment data volumes will start to be a possible route.

Finally, once you’ve acquired and stored or can access your data, you have to use it. The challenges here range from data base architecture to modeling methodology; if you don’t build the correct architecture and have a proper DS and modeling methodology, new data integration will be a nightmare. Almost no single data source has 100% coverage across all countries, has homogenous data quality, is 100% available (given that you don’t store it on your system) and adheres to a tight SLA, all at the same time. So on top of what we noted you also need to have models that can cope with partial, sometimes corrupt data and still make the right decision – far from easy.

So what do I do?

I know what you’re thinking. “I don’t need all of this”, you say, “I bought risk scores and tools from various vendors with proven track experience in risk management. I’m all set”. Let me tell you why I’m not fond of this as a general approach: giving scores as a result instead of raw data obfuscates vital components, and severely reduce your ability to understand why a decision was made or why was a specific score given. When you don’t know the underlying reason, your ability to effectively combine scores or simulate any changes made to them and its effect on your system and bottom line is zero. You’re left with a few business rules and a false feeling of control that may result in serious losses or simply lost business.

So what should you do? If you’re a small business without a risk management function, you’re stranded. I suggest that you settle for the few scores and tools that take on at least part of the liability – professionals should be able to put their money where their mouth is (and there are quite a few professionals out there). But being in this situation is not what I’d advise for anyone looking to really grow their business – you need to keep your eye on the ball in risk and fraud. Develop the capability to understand what’s happening in your system, what caused losses and why (easier said than done). If you at least have that, even by mere intuition of being in the business and seeing a lot of fraud, you can start putting a price tag on new scores you’re being offered (I, of course, support hiring and training of domain experts). But what you’re really looking for is creating a data source acquisition methodology.

You need to understand what a new data source does for you. Do not get confused by terminology and flashy names; a common confusion, for example, is between products that verify a person’s identity (i.e. make sure that the name, address etc. belong to a real person in the real world) and authenticate it (i.e. prove that the current user is indeed who they claim to be) – those are not the same. Another common mistake is signing pricy SaaS contracts (say – for phone number type) when similar capabilities can be found and acquired by a bit of Google research. Don’t be tempted by big promises – always make sure you properly simulate the performance on your own system, and fully understand the impact you’d expect to get.

Making sense out of all of this requires expertise, but is definitely worth the price. This is not to say, by the way, that there are no effective tools, scores and services out there. On the contrary – there are sometimes too many, and it’s the job of the risk manager in the organization (a lot of times the owners themselves) to make sure they are using the best ones for their needs. It’s no simple task.

How do you engage in data source acquisition? Do you think that there’s no such thing as too much data? Comment away!


Comments: (0)

Member since




More from member

This post is from a series of posts in the group:

Data Management 101

A community blog about data and how to manage it

See all

Now hiring