How to Stop Web Scraping Bots from Stealing Your Site’s Content
- 19 July 2021
- Posted by: hastingsio
- Category: Non classé
You’ve worked hard generating incredible content for your site, and now bots are scraping your hard work and publishing it on spammy sites. Their scraping is costing you bandwidth, distorting your analytics, and most of all affecting your search rank. Your web site and your content are an important asset. Today, we’re talking about strategies to protect your hard work from being stolen by scraping bots.
Have you tried Wordfence Central? Manage all of your site’s security in one easy-to-use interface.
Now, with Wordfence Central Teams! You can use Wordfence Central with your Premium AND Wordfence free sites, all for free.
Check out Fast or Slow, the only free website speed profiler that tests your site from 18 locations worldwide.
Sign up for the Wordfence WordPress Security mailing list. Be the first to know when there is a vulnerability in a plugin or theme you might be using.
The Wordfence Learning Center has all you need to brush up on WordPress security and more:
Wordfence is the most popular choice of WordPress professionals for WordPress security. We have a number of security tutorials on our YouTube channel, including Wordfence tutorials. Wordfence security plugin is the number one choice in WordPress security plugins.
Follow us on Twitter:
Listen to the Think Like a Hacker Podcast
#wordfence #wordpress #security #seo2021
5:09 What is content scraping
6:20 Legitimate, good scrapers and spiders
7:30 Bad scrapers and why they scrape content
8:45 Who do malicious scrapers target and why
9:10 What scrapers do and how they work
10:40 Types of scrapers and spiders
12:45 Do spiders/scrapers follow robots.txt instructions
15:50 What are the harms of content scraping?
17:07 How can you tell if your content is being scraped
18:00 What do scrapers look like in logs
18:55 Why do content scrapers steal content
19:33 How to defeat content scrapers
21:39 Wordfence Live Traffic, identifying bot traffic
23:00 What about htaccess files
25:29 The Wordfence Real-time blocklist
28:31 Using xframe same origin to protect content
30:50 Rate limiting with the Wordfence plugin
35:09 XMLRPC and REST API
36:30 RSS feeds
37:40 DMCA takedowns
40:10 Over protecting your content can be risky
41:00 Why one page view can generate numerous requests
43:00 what about rel=canonical