Study Finds Hundreds Of Top Sites Record User Keystrokes And Personally Identifiable Information

A new study has found something very disturbing on some of the top websites in the world. Researchers from the Princeton Center for Information Technology Policy have conducted research that found that over 400 of the top 50,000 websites in the world are using "session replay scripts" to track user behavior. This practice is common and the real problem with these websites according to the researchers is that the sites are often not stripping out personally identifiable user information from the data the scripts glean.


The researchers are concerned that this sort of session data is a potential treasure trove for hackers because that data sometimes includes information like passwords. Researchers Steve Englehart, Gunes Acar, and Arvind Narayan looked at the seven top session replay companies that provide the scripts used on many of these websites. These companies include Clicktale, FullStory, Hotjar, SessionCam, Smartlook, UserReplay, and Yandex. Researchers set up test paged with session replay scripts from six of the session replay companies to see what data was recorded.

"Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details, and other personal information displayed on a page to leak to the third-party as part of the recording. This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes," the CITP researchers explain.

The researchers note that some of the session replay scripts don’t collect user data at all, namely SessionCam and UserReplay. Most of the scripts use automatic and manual redacting tools to remove personally identifiably information. The catch is that due to the massive amount of user data, some personal data still ends up being collected because the volume of data makes scrubbing data manually hard to do.

One of the websites using these scripts is for the pharmacy Walgreens. The problem here is that since manual and automated scrubbing of personally identifiable data isn't 100% effective sometimes protected medical data such as name, medical conditions and prescriptions are recorded. The researchers also found that while many of the websites use HTTPs to secure data, the session replay dashboards are often only secured using HTTP, which is vulnerable.

Yandex told Motherboard, "HTTP is used intentionally, as session recordings load websites using iframe. Unfortunately, loading HTTP content from HTTPS websites is prohibited on the browser level so HTTP player is required to support HTTP websites for this feature."

One of the websites, Bonobos, has already responded to the researchers report and ended data sharing with FullStory and says that it is reviewing protocols to better protect its user data. Walgreens has also followed suit and stopped sharing data with FullStory while an investigation into the claims made by researchers is performed. Walgreens may run afoul of HIPAA regulations if private identifiably patient data was given to third parties.