LinkedIn has become a major player when it comes to professionals networking and searching for new jobs. According to Datareportal, LinkedIn has over 700 million users, making this one of the largest social media platforms!
This is an astonishing number of users, especially considering that the site was once considered the “Facebook of professionals.” However, there’s another unpleasant fact about the number of users. Back in June 2021, the site was scraped by a hacker unground forum.
They said they were in possession of data for 700 million LinkedIn users. They said the information they scraped included name, gender, email address, job title, industry, and more.
The personal details for all of these users were then for sale on the web before being intentionally leaked. However, there were no buyers. So, the data was made freely available on the web.
Another interesting fact is that the hackers used a method called “scraping,” which is against LinkedIN’s terms.
So, what do UK businesses need to know about the leak and data scraping? How can you keep this from happening?
Details of the Great LinkedIn Data Scrape of 2021
The event took place on June 22, 2021, and was done by a user with the name TomLiner, who was on the RaidForums. This is a site that is famous for sharing leaked information. TomLiner claimed they were in possession of the 700 million records that contained personal information of LinkedIn users.
The collection of personal data included:
- Full names
- LinkedIn IDs
- Date of birth
- Workplace address
- Facebook & Twitter IDs
- Job titles
- GPS data
This is not the full list of breached data; however, it does give you an idea of the breadth of information taken by the hacker.
It’s interesting that when TomLiner tried to sell the personal data, no one took interest. This was because he leaked all the data in September, making it available for anyone to access.
Another interesting fact is that the data that was leaked was technically already available to the public. So, this was not an actual breach of LinkedIn’s network. It was scraping of information.
What is Data Scraping?
Data scraping is the process of taking large amounts of data from a website very quickly. The method uses automated programs that run through the pages of content, starting at the top and working down. This is called scraping.
The data is then saved in another location. The effect of scraping is that it creates a large amount of data from a website.
Who Uses Web Scrapers?
Hackers want to steal data that is very valuable. This can include many types of data, such as the information scraped in the LinkedIn incident. Scraper bots are used to scrape information. Once this is done, the hacker can then turn around and sell the content.
Scraping is a common practice; however, it’s not usually done with bots. Instead, online retailers and other types of businesses hire professionals who specialise in scraping data from websites. They may use web scraping tools or gather the information by hand. This is a way the company can gather data to improve their competitive edge by creating strategies that help the business grow.
However, hackers disguise the scraping bots to look like “good ones” (such as mentioned above).
Is Data Scraping legal?
Yes, it is, even though the method targets public-facing information. However, scraping is done on a huge level, which a human is not able to do on their own. There are some instances where data scraping is illegal. For instance, if a hacker scrapes data containing registered trademarks and tries to sell the information, the data owner could go for a legal remedy.
Is Scraping a Cyberattack?
Not exactly, as scraping isn’t technically hacking, such as brute force attacks and other data breaching. The method simply gathers data that’s already publicly accessible and then compiles the information. How the information is used could be considered a cyberattack.
However, the act of scraping is not illegal and is not considered a true cyberattack.
How Can You Protect Yourself Against Data Scraping?
There are a few things an IT support team and individuals can do to prevent data scrapes like the one at LinkedIn.
Here are some ideas to help you protect your data:
Use authenticated content gating: for companies that have publicly exposed customer and/or user information, forcing users to sign up before they can see the content can keep data safe from scraping.
Limit activity users can do: scrapers have the ability to continuously search or request data from a site. One remedy is to limit the number of requests an individual user can make in a specific amount of time. This can work to slow down or stop the scraping process or an attempt to scrape data.
Monitor website traffic: data scraping causes a large footprint on a website. And no wonder when the data is sent from the server to the hacker. Monitoring your site traffic for large amounts of activity, which are not normal, can catch the scrape before it’s done or at least limits the risk of being scraped.
Use a captcha: scrapers are sometimes hosted on certain web or cloud services, so it can help to show a captcha to prevent access to the site through the use of automated programs coming from these specific services.
Change HTML regularly: scrapers must look for patterns in a site’s HTML markup, which they use as clues to help their bots find the right data on the site. One way to combat this issue is to regularly change your site’s markup. Doing so makes it harder for the data to be scraped.
Create “honey pot pages:” honey pot pages are those that a human would never visit. However, a robot intent on scraping data might. The link can be disguised to “display:none” in the site’s CSS or blend in with the page’s background. These are usually designed for the web crawler. If you find an individual has visited the honey pot page, it’s pretty obvious they’re not human. Then it’s possible to start block requests from that individual.
It’s important to take steps to limit the data scrapers can take. Businesses need to be vigilant and constantly monitor their site and other resources from hackers.
The solutions in this post can help mitigate scraping attacks. It may even be necessary to hire an IT support company to help keep your data safe.
23rd February 2024
16th February 2024
9th February 2024