Director of Avenga Labs
You’ve created a great digital product for your customers and they like the new user experience and interface. It wasn’t a walk in the park and the project took a lot of effort, but finally the product is here and it’s working great.
The performance is very good as well. You are seeing increased traffic and that makes you happy, and it’s good for the business.
But . . . let’s imagine that lots of this traffic is not coming from your customers but from bots. Automated digital worms that are eating your web application alive. They are consuming precious computing resources like CPU power and network bandwidth. And unfortunately, it is you who pays the cloud provider to enable these bots to scan your content and applications faster. Your customers suffer from the lower performance and responsiveness. The evil bots violate privacy and it is you who is unaware of it all. Your web application resources are overused by the bots and it renders your application unable to respond effectively to requests from your users. The bots create performance bottlenecks which are noticed by your customers.
It is a digital nightmare that has come true. So, what can you do?
Our Wao.io Web Performance Optimization Team has been obsessed with web performance for years; in a good way. They have helped hundreds of customers to optimize their web experiences with our consulting services and products (wao.io, couper.io).
In order to do that more and more effectively and to be able to respond to the ever changing digital landscape, our team is constantly researching new ideas and technologies.
One important research project is about analyzing suspicious web traffic and acting on it.
Web traffic is constantly analyzed by our team. They use rules created by the experts plus automation so as to take advantage of the latest machine learning technologies, including advanced neural networks and random forest decision trees.
The traffic is then classified into normal traffic and suspicious traffic. It’s a very complex process, requiring lots of expertise in this area, as well as tons of experimenting and trying to find the right . . . questions and classification decision criteria.
The bots routinely use the older versions of HTTP (1.1), that generate single requests repeatedly, and always ask for the same type of contents (mime type) but . . . not always 1.1, not always single requests, and sometimes even with an empty User-Agent string. To consider it as bot traffic, it has to repeat with a similar behavior pattern in order to allow the classification engine to make a reliable suggestion. We wish the rules were simpler, but they are not.
Different variables and parameters are taken into account to be able to process all this information properly and find which ones matter.
It’s not just like a blink of the letter (like in the Matrix movie), but there’s much more of it – sometimes even 90% of the traffic is unwanted traffic coming from the bots!
Our web performance optimization team shared the examples to help us visualize the impact.
This is how bots (the red part of the bars in the upper chart) affect the critical parameter of the web application performance called Time To First Byte (percentiles in the lower chart). Bots are invisible to most user tracking tools. In its irony, the bots that are to help with search ranking can actually hurt your rankings because of lowered performance and a resulting penalty for that from Google.
And even worse, the difficulty perspective of some bots are acceptable for particular business clients and unwanted for others.
But if you hurt what's mine, I'll sure as hell retaliate…
So now we’ve detected the activity of the bot and we know which bot it is.
What can you do now?
It seems obvious that you should block all these bots and get rid of them once and for all.
As always, digital reality is much more complex than it seems, for the majority of us.
There are different bots. There’s no simple binary one or zero, good or bad, and the classification depends on the context. Let’s analyze the three types.
Unwanted bots are coming from suspicious sources or without identifying headers or coming from countries with which your business wants to have nothing to do with, as well as organizations you don’t trust; for example, SEO tools selling advertising insights gathered from your site to your competitors.
They should not be mixed with DoS attacks and other malicious (i.e. hacking) activities, unwanted bots are ‘simply’ the bots you don’t want to impair your web performance.
In these cases the needed action is to block the bot traffic.
I don’t think any monopoly can ever be called friendly, so I’ve decided to call these bots useful. Mainly, because e-commerce and proper content promotion require crawlers from Google and other search engines (like Bing) to access your web product pages.
Image: we see that the traffic is evenly distributed at an acceptable level, no spikes.
These bots usually behave well. They are well written and don’t cause major traffic, so they should be allowed to access your web digital solution without any problems.
Unfortunately, sometimes their behavior is also unacceptable. They can generate spikes in the traffic for hard to explain reasons, but at least the effect of their work, like finding your products easier by the customers, is beneficial.
One of the key parameters to observe is the portion of server time wasted on bot generated traffic compared to the time required to serve traffic from the regular web users. If it is below 10% it is usually considered OK, but as you could see noted above, it may even be 90% of the time used!
In all countries there are local bots focused on analyzing product offers and local search engines. They are in Germany, Switzerland, France, Poland, UK, US, etc . . . everywhere the digitalization maturity level is high.
Regrettably, these bots are usually not behaving so well too and their usefulness is therefore questionable. In the world of traditional traffic management solutions, you have to decide whether to allow or to block them.
This is an example of a bot generating spikes in the traffic which makes it an unwanted bot, from the performance perspective. But from a business perspective it may be a necessary evil, as you still want it to access your web pages.
Fortunately, there’s a better way. Our wao.io team has found an excellent solution which addresses this middle ground in a very smart way.
Let’s say the given bot is classified as a `necessary evil` bot. It does not behave well and generates spikes of traffic that affect the performance of your web application, but there are business benefits so it must be let through.
Our wao.io team created a smart solution for this problem. You can think of it as a leash for poorly behaving bots, allowing them access to your digital product but only in a civilized manner.
Pattern 1 (occasional): These are the good bots. The frequency is steady but with pause intervals. This is how you want your bot to crawl. One of these alone will not be of any harm.
Pattern 2 (serialized): This type of bots send their requests in a serialized way, one after another, generating steady traffic without delays. However, there is no room for “breathing”. A good bot would leave a pause between requests maybe even adhering to a time bound crawl budget.
Pattern 3 (parallel): Some bots aggressively flood your web servers with lots of traffic in a short period of time, sending their requests in parallel which is the worst case scenario we want to deal with.
All this traffic is automatically serialized and slowed down, and handled one after another, with pauses between each request to make it civilized and to stop hurting your web app performance. All the requests go to the pool, a custom queue that effectively deals with unnecessary parallelism and randomness.
In other words our tool can flatten the curve of the traffic spike, which makes it tolerable for your web application performance without losing the benefits which these bots provide for your business.
With so much sophistication in the automation of bot detection and traffic throttling, there’s still a critical human factor.
As I wrote before, the same bot may be acceptable for one wao.io client and unacceptable for another (local search engines, SEO tools).
wao.io is about providing a choice for users, allowing them to select which bots they want to block, which to allow without limitations, and which to be handled by our sophisticated bot traffic management system.
The bots used to be considered “just” a privacy problem and now they have also become a performance problem, which is another important reason to deal with them in the right way. Our solution will lower your CPU consumption, bandwidth, and costs as well as enable a better app performance, all of which have a direct impact on your digital business.
Web performance and security is tough, and it’s getting even tougher with the increased load of bots, trade wars between major market players, and new web technologies.
What we encourage you to do is . . . not to do it alone.
It’s a very interesting journey which never ends, and it takes a lot of knowledge and even more practical experience. We, at Avenga, are here to travel the road with you, in a practical and effective way as consultants for web optimization with the ready to use wao.io product in a convenient SaaS model.