The Data Quality Imperative

by Ted McConnell, Featured Contributor, October 13, 2016

Earlier this month, OMD’s Julie Fleischer and Neustar’s Steven Wolfe Pereira poked at data integrity issues during a Chicago conference, creating some timely swirl by highlighting a simple truth: We lack data integrity standards. Advertisers are becoming skeptical of claims about data. We owe them some accountability, but they owe themselves some diligence as well.

The enemy: oversimplification

People glom on to the difference between “deterministic” and “probabilistic” as some sort of magical line in the sand, but the false sense of comfort is dangerous. Yes, facts exist (deterministic data), but riddle me this: If my credit card shows I shop at Whole Foods, am I rich? Any inference made from deterministic data sends us right back down a statistical rat hole.

Another schism divides planning and activation. I might know for sure that certain cookies clicked on a hotel ad. Planners may decide they want to contact that group. For television, there is no choice but to reduce that audience to a demographic. For online, the exact people can be activated. This is one scenario that taps the value of tight integration between a data-management platform (DMP) and a demand-side platform (DSP).

Grim reality

The CEO of a large agency recently said to me, “Everyone tells me their data is great. How would I know?” Indeed, how would he? Even high-quality data can be ruined by inappropriate use. Online contact usually depends on data, and the quality of the audience (as data) is just as important as the quality of the context. Data quality might be half of effectiveness.

It’s time for advertisers to hold themselves and their suppliers accountable for the quality of data and the conclusions derived from it. The risk of not getting this right will be the commoditization of data (and ergo, consumers). That stands, in my opinion, high on the list of strategic risks for the online ad industry.

So, by inferred popular demand, I present here, a listicle of quality attributes for advertising data.

Recency

If you are buying, say, “beauty category buyers,” it’s pretty safe to say that they are still interested in beauty after six months. But, if you are buying six-month-old “Auto Intenders,” there might be a pretty good chance they no longer need a car.

Veracity of the inference

How is the purported meaning related to the data? For example, if I went to a page that mentions the word “skin,” am I interested in skin cream?

Observation vs. declaration

Some data is derived by observing what people did (for example, cookie, panel). Other data is derived from what people said. Third-party data sites, which collect observations (see: http://www.bluekai.com/registry/ ) are pretty good at nailing interests from Web site behaviors; demographics, not so much.

Conformance with actual intent

Say you bought a segment of people interested in adhesives (glue intenders!) for a $2.50 cpm, and lo and behold, its 100,000,000 browsers. To validate, ask some of them. What will they say? I like glue? I studied principles of adhesion in engineering school?

Proximity of use to source

This is segment “telephone” in the data supply chain: From data collector, to data aggregator, to DMP, to DSP, and maybe a Boolean “and,” on-ramping, de-duping, and domain-space resolution. Organically grown soybeans can end up as Cheez Whiz™.

Likelihood of actual reach

There are several reasons a cookie may never create reach. The user may never show up in the footprint, or simple cookie deletion. You could buy 50 million users and only find half — or a tenth — of them. Time helps, but this is a serious impairment.

Fit with actual prospect density

You might buy a segment of 20 million new pet owners, but are there really that many out there? Inflated estimates of prospect density are the first symptom of naïve hope.

Census vs. sample

Sampling is the lovable, wonky, heart of statistics. It’s all a gamble unless you are counting cards, in which case you have a census (i.e. not a sample). If your uncertainty lasts for over four hours, please call your data scientist.

Noise vs. signal

Data is dirty. We call it noise. It runs from 0 to 99%. It’s best to know.

So, there’s a thought starter: Data is not magic. There is good data and horrible data. Much depends on how it is applied. It’s not all that esoteric. A little common sense can go a long way.

MediaPost.com: Search Marketing Daily

(47)

You may also Like

Targeting B2B personas in the right channels to optimize campaigns

5 Often Ignored Ways of Making the Most of Content Marketing

How the PR Team Can Make Revenue Gravy

4 Signs Your Small Business is Ready to Grow

Boost Your Career – Don’t be Left Behind!

What Does the FTC’s New Native Advertising Guidance Mean for Brands?

6 Tips On How To Keep Your Viewers Once You’ve Gotten Their Attention

How to Get More Facebook Page Likes

Ways to Build Your Personal and Small Business Credit Score