Chat with us, powered by LiveChat

5 Topics to Consider when Shopping for Online Data

5 Data Topics for online shopping

The holiday season is here! Thanksgiving, Black Friday, and Cyber Monday are the biggest shopping days of the year and are literally just a few days away.

Since we’re on the topic of shopping, when you shop for data always keep a lookout. Why? More and more providers are striving to feed the growing demand for deep audience insights, but the quality of the data being offered can drop off dramatically. It’s no secret that selling data is a volume-based business. The higher the volume, the more money data providers can earn. There can often be a lack of quality control standards, which should be avoided. It’s unfortunate that even good-quality data can lead you astray if it is misinterpreted or used inappropriately.

So for your safety, we put together 5 topics to consider when shopping for data.

Data Decay

Nearly 3% of your customer data is changing every single month. That’s more than a third of your customer data in only a year. You know why? People just can’t stop doing stuff. They keep moving, switching email accounts, changing phone numbers, and their interests change over time. All of which only complicates the integrity of linkages used to obtain device IDs as well as the marketing signals used within a campaign.

This means that refresh cycles should be explained by your data suppliers. How often is PII being corroborated and updated, how often are attributes being phased in and out, and how often is a full replacement of both the PII and the attributes sent to onboarders and DMPs? A common issue with these cycles is that buyers often only focus on one of the updated schedules or ignore the value of full replacement over delta changes, which leave campaigns exposed to stale links and signals. Be sure to demand fresh and accurate data from your provider.

Deterministic vs. Probabilistic

Providers collect two different types of data, deterministic and probabilistic data. Deterministic data, which is based on self-declared attributes, behaviors, and desires reported directly from the individual. Probabilistic data, which uses modeling based on assumptions or interferences to create audiences. While neither approach can be said to be technically “better” than the other, each method has its strengths and weaknesses, the decision comes down to application.

Probabilistic systems typically assign a percentage (75 percent, for example) indicating the probability of a match. While these systems pinpoint variation and nuances to a much finer degree than a deterministic approach, they are better suited for businesses that have complex data systems with multiple databases.  

Deterministic systems, on the other hand, are best suited for applications where the number of records is relatively small (Typically in the millions), there are fewer data attributes and there is no great consequence of error (Non-Medical). A perfect application for deterministic data is mailing list processing. If the system matches a name to an incorrect email address, the email would be sent to the wrong person, resulting in decreased deliverability and other potentially negative consequences. The result is an either/or outcome: Either records match the requirements of the business rule or they don’t.

Combating Fraudsters and Robots

You should know what your data provider is doing to combat fraudulent and robotically generated records. Data breaches, hacks, and scams are so common today, you need to know what specific actions your data provider is taking to purge harmful data from their ecosystem. After all, it can be counter-intuitive for some data providers to intentionally cut their own supply. Their answers to questions about mitigating fraud will be telling as to whether volume or quality takes priority.

Does Search Traffic Equal Intent

Hard to reach audiences? Search traffic is a great source, but like all data, search traffic needs to be questioned to determine if the assumptions being made by the data provider are logical to indicate intent. For example, if an individual search online for “Tesla 2018 Model S” could you assume that they are in the market to buy a new vehicle? Or would it make more sense that a portion of the Tesla 2018 Model S are searched from enthusiastic fans, looking to check out the new Tesla car? In fact, research has shown that only a third of all automotive search traffic accurately conveys intent to buy. Now don’t take this as a sign you should abandon search traffic data providers, but you should understand how and why assigning the coveted intent signal to a search makes logical sense.

Consider the Source and Scale

Believe it or not, the data world has many components to it. Let’s take a look:

  • Offline Data Aggregator
  • Online Data Aggregator
  • Data Onboarding
  • Data Management Platform
  • Demand Side Platform
  • Customer Data Platform

Is this good or bad? Let’s leave you with this. Be cautious of what goes in one end of the supply chain because it may be unrecognizable when it pops out of the other end. For example, deterministic data can be given to a more onboard and yet the output is a probabilistic match. That can cause unknown difficulties when it comes time to rely on those results. In the end, the closer a data provider can bring you to a specific individual – the better your campaign will be.  This isn’t a game of horseshoes, close isn’t good enough.


The need for quality data is growing every day, and more and more data providers are appearing out of the weeds as a result. To ensure your campaign is using the best data possible, be sure to ask strategic questions and discuss the topics we’ve mentioned above to help you compare data providers.

Be sure to start your journey with Webbula. Webbula offers only self-reported, deterministic, and individually linked data. We keep brands safe each and every day from fraud, robots, and scammers by removing those hazardous records from our audiences with our Webbula cloudHygiene. Contact us today to see how you can drive campaign ROAS (Return on ADs Served) with the highest quality data in the industry.


Check out these helpful Intelligence Reports:

What You Need to Know about Data Quality

How to use 3rd Party Deterministic Data for Cross-Channel Promotions

Deterministic and Probabilistic Face Off in the Ultimate Data Challenge

Powered by Optin Fire