By Manny Wald, Published November 14, 2014
A lead generation form on your website is a great way to get new business. But if you allow spambots, or automated programs, to fill out your forms, you’ll have a stack of useless leads to sift through.
Spambots are bad for everyone involved with online lead generation. They adversely affect the reputation of lead sellers. For lead buyers, forms filled out by automated programs have financial consequences. Not only that, but the very infrastructure of a publisher or lead generator’s website could crack under the weight of all that spam.
Spam also has a negative effect on analytics. Traffic, measurements, and data segmentation may not factor in the portion of traffic generated by bots, which can skew predictive models built using this data for real-time decision making.
Let’s look at 5 specific strategies you can use to block form bots.
- Simple block. If you discover that a high volume of transactions are coming from a single IP address, then block the IP address. Additionally, if you see multiple IP addresses that are related, block the range. Do so in multiple places: in the app, on the content delivery network, at the load balancer, and on the web server. The downside of this simple blocking strategy is that it doesn’t take into account pooled IP addresses, meaning multiple users coming from a single IP. If you block on that single IP address because one person is up to no good, you may end up punishing others who are innocent. Because IPV4 is still the standard protocol for issuing IP addresses, there’s a limit to the number of IP addresses you can access, leading to pooled IPs. IPV6 helps, but adoption of that protocol is far from universal.
- Two pronged approach. To avoid punishing those who just happen to be sharing an IP address with bad actors, you can block on a combination of IP address and user agent. A user agent is a way for a browser to identify itself to the remote server. The user agent can be surprisingly distinctive. It identifies the operating system, the device type, the frameworks installed, the browser, and so forth. The user agent is helpful in situations where you have multiple people behind one IP address; one user is sending out spam and others are not. The problem with this solution is that it’s reactive. You don’t know there’s a problem until after it occurs. While better, it’s still imperfect. You can make some distinctions based solely on user agent, but those decisions are not precise to the device level. And there’s still the potential issue of blocking multiple people in a single address.
- Just Add Location. Utilize IP address geolocation services to proactively block traffic from certain countries of origin. If you are offering a product or service that’s only relevant to the United States, you don’t want anyone from Nigeria submitting lead data. Some IP geolocation services are more accurate than others, and you need to watch out for additional potential latency in doing that lookup. It depends on the service you choose and the type of integration: if you have to make an external API call, it will be more expensive than looking up data locally. While the advantage of using a local copy is that you can get lookups faster, the accuracy isn’t as good as making an API call to get lookups. External API services are expensive; they will take a little longer, but overall are more accurate. The issue with this approach is that a country like Nigeria gets a bad rap. It’s the 7th most populated country in the world. Lots of folks are dual US/Nigerian citizens, and they get upset and frustrated when they can’t do basic operations on US websites. While adding geolocation to the mix improves the chances of screening out spambots, it’s still kind of a blunt tool.
- Piercing Tool. There are often multiple layers of IP address information when proxy servers come into play. The source IP address can be obfuscated by transparent, anonymous, or botnet proxy servers. A transparent proxy server is the easiest with which to work. It will identify the original IP address so you can actually see both the IP of proxy and the IP of origin. Then there are other servers that are anonymous and will not expose the original IP address and sometimes not identify as proxy servers at all. Finally there are botnets. These are internet connected networks built using compromised machines. If the traffic looks like it’s coming from a regular user, that’s because it is. In any of these three scenarios, using the source IP to employ IP blocking or geolocation can be challenging if you are not getting the true IP address. Use third party services that can pierce IP addresses to get to the actual user address.
Blocking generally means taking an action; for example, providing a challenge, like a captcha. Unfortunately, there’s always potential for false positives. If you’re putting a hurdle in front of a real human, it will negatively affect user experience. A false positive in the form of a challenge, however, is better than preventing a page from loading and a form from being filled out. The more precise the technique you are using, the better suited you are for taking the right action.