Nov 2, 2020

This Page is Tracking You

Introduction

Well not really, but every service you use offline or online keeps a bit of information about you. Whether you visit a place, make a purchase or just go for a stroll in a public place there are systems in place that create a digital or physical trace of your presence. Which is not a bad thing if you think about it, after all it is natural to keep records of events and sometimes it is necessary to keep the service running. Also, it is a common practice to reveal such tracking, stores declaring that they have cameras on premises, cashiers asking for your phone number in exchange for a loyalty program maybe, businesses giving you options to choose your mode of payment or a new startup asking for your feedback. Sometimes it is compliance and sometimes just a business trying to improve their product and services but it is always on-demand or necessary and you always have the option to deny.

Even this site is using google analytics for user analytics and adsense for monetization.

What should be considered tracking ?

I believe that data collection becomes tracking or invasion of privacy when it happens without even telling the user that it is happening, when the collected data is used for influence instead of product improvement, and when the collected data is even sold to others.

Services do try to explain that everything happens with users consent and everything is mentioned in Terms of Services and Privacy Policy documents but In reality those documents are just to make sure that services don’t violate local government’s compliance. In no way it is a two way negotiation since services don’t even bother to honour DNT implemented in most browsers. In fact if it comes to actually asking for data collection consent in plain words and that too without an incentive then no one will agree to it.

The industry that runs the internet, so they say. Half a trillion dollar market that drives digital marketing for businesses. From collecting consumer data and behaviour to creating segments and selling the information to the highest bidder to place their marketing material in front of them. The practice is not new at all but has been there since forever, way before the internet was a thing. But the scale and depth of data collection has increased significantly.

Sure there are a few positives to targeted advertisements as well, like reducing marketing budget and making it more effective, cutting down spam and junk mail, subsidizing ad-supported services available offline and online. But then again the concept is not new and pros are no way bad but the scale on which it runs is simply making it unethical.

Products have been created just to gather users and their personal information and provide an attention grabbing platform which in turn becomes a medium to serve advertisements. Services are going out of their way to think of ways to collect as much information as possible whether a user is consuming the service or not. These products are not innovative and not making an impact on the world but instead thinking just how far they can push themselves without violating any existing law. These practices are definitely new and came into picture in recent years with the rise of the internet. Our outdated business regulating laws are in no way capable of curbing them.

Data collection is inevitable

As mentioned before, data collection is inevitable and is actually required for online service to function properly. Services nowadays run on a variety of clients and without knowing some information about them they won’t be able to perform the function they are supposed to. For example, the type of browser, screen size, operating system and much more. The problem is that it was never a privacy concern since browsers do provide a very secure sandbox. It’s just the way the whole app ecosystem developed that now if we try to change, it will break a lot of things.

Here is a small example of how much information a browser simply lets an application collect. For some data user consent is required.

Taking the control back

Well it seems like there is no way out of it. Laws are not changing anytime soon and even if it happens the regulatory authorities can’t be too radical and all this is on its way to become status quo. As consumers we do have rights and freedom of choice, the most effective way to take the control back in our hand is to use intelligent clients to consume services in this ecosystem. Using a privacy aware browser and helper plugin and addons is one way to keep this invasive practice in check.

For this discussion I’m taking the example of Firefox and uBlock Origin and explaining how it works. The architecture diagram is divided into two columns, the rightmost column denoting phases and the rest of the diagram representing the browser and plugin internals and how it handles external service’s code.

A more detailed architecture review and code walkthrough of uBlock Origin post is coming soon

ublock

Phase A, This is where the consumer expresses intent that they want to use a service and provide its address, purely consensual like visiting a store. Not quite though, redirects and blank target links might do it on behalf of the user. That’s where Phase B comes in.

Browse prepares itself to provide resources for this new request in form of a dedicated tab and checks for any potential risks, like known bad addresses, unsecure places. Beside this the background scripts of your plugin checks if the address is something that you actively want to avoid. If everything checks out and both browser and plugin give their go ahead then the process moves on to phase C.

This is the most chaotic one, the initial (root) URL starts getting its content and which includes client side script and further network requests to get more content. Browser settings, sandbox and standards protect you and your system from any potential risky behaviour, then it’s up to the plugin’s content script to selectively block or allow requests based on your preference. To protect users from unnecessary analytics it uses its own version of neutered SDK using web addressable resources and makes them ineffective.

Once the page load finishes and the tab is ready state moves on to Phase D. User starts actually using the service, interacting with it and generating all kinds of events, at any point of time the interaction might result in a portion of service to go back to phase A or start a new one altogether, at which point the whole thing kicks in again.

In short these measures act like a firewall and protect the user from unwanted application behavior as instructed by the user.

Some Extreme Measures

Ad Gathering and Clicking

Some plugin actually collect all the links to advertising and then click them after stripping users information
Event Spoofing

Analytics SDK can be spoofed add noise to the signal that the data collector process is trying to process
Sending False Information

Instead of blocking send scrambled or falsified information.

These measures are said to be extreme because the intent changes from protecting the user to hurting the service providers. Offensive measures like these impersonate user behaviour which could be considered unethical since as we talked before sometimes the data collection is necessary and not intended to be evil.

Watching the Watchmen

If you end up using a plugin or addon along with your browser then do make sure that they themselves are not causing any harm. More information on how to monitor an addon can be found here

or open this link (Firefox) in a new Tab and poke around, specifically Network and Storage information.

about:debugging#/runtime/this-firefox

Conclusion

Purpose of this document is to talk about what we as a consumer can do to protect ourselves. Regulation, laws and corporate ethics will take their time but there are a lot of measures that can start creating an impact right away.

./make all