Finding AI Bots in Your Web Server Log Files

The following are additional ways to process your web server log files found in our presentation “How to Protect Your Content from AI… and Should You?“. This allows you to quickly evaluate how often your website is being accessed by AI Scrapers, AI Crawlers and Assistants.

Bots accounted for over 50% of all internet traffic in 2026.

Extract all of the lines of AI Bots from Server Logs with grep

Below learn how to use the grep command to filter and extract entries related to specific bots from an Apache log file. This method helps focus on the traffic that AI Scraping generates on your website that might violate your AI Robots.txt Rules.

Prerequisites

Access to a Unix-like operating system (Linux, macOS)
An Apache log file, typically named apache.log

Instructions

Open Terminal: Start by opening your terminal application.
Navigate to the Log Directory: Use the cd command to change to the directory containing your Apache log file. Replace /path/to/apache/logs with the actual path to your Apache log files.

% cd /path/to/apache/logs

Execute the grep Command: Use the following command to filter entries related to specific bots from your Apache log file. This command will search through apache.log for lines containing any of the listed AI Bot names and save the matching lines to filtered_apache.log.

% grep -E "GPTBot|Google-Extended|anthropic-ai|Claude-Web|Claudebot|CCBot|Diffbot|FacebookBot|Bytespider|Omgilibot|Omgili|ImagesiftBot|Meltwater|Seekr|ChatGPT-User|PerplexityBot|Amazonbot|Applebot|cohere-ai|YouBot" apache.log  > filtered_apache.log

Understanding the Command:
- grep -E: Invokes grep with the -E flag to enable extended regular expression matching.
- The long string of bot names separated by | is the pattern grep will search for in the log file. The | symbol acts as an OR operator, meaning any line containing at least one of these names will be matched.
- apache.log: The name of the log file you are searching through.
- >: Redirects the output of grep to a file instead of displaying it on the screen.
- filtered_apache.log: The file where the matched lines will be saved.
Review the Results: After running the command, filtered_apache.log will contain only the log entries that match the specified bot names related to AI Scraping. You can view this file using a text editor or the cat command, like so:

% cat filtered_apache.log

By following these steps, you can efficiently extract and review the activities of specific bots within your Apache logs. This process is valuable for analyzing bot behavior and ensuring that AI Scrapers interact with your site as expected.

How to Count AI Scraping Visits in Apache Logs with a Shell Script

Below is the process for creating a shell script that counts visits from various bots in an Apache log file and outputs the results in a text file. This method will be valuable for web admins looking to analyze AI bot traffic.

Prerequisites

A Unix-like operating system (Linux, macOS)
An Apache log file (commonly named apache.log)
Access to a text editor, either graphical (like TextEdit on macOS, Notepad++ on Windows, or gedit on Linux) or command-line (like nano)

Creating the Script

Using a Graphical Text Editor:

Open your text editor: Launch your graphical text editor.
Create a new file: Start a new document.
Write the script: Copy and paste the following code into your document.
Save the file: Save your script with a .sh extension, e.g., bot_counter.sh.

Or using Nano (Command-Line Text Editor):

Open Terminal: Access your terminal application.
Create and edit the script file: Type nano bot_counter.sh to create and open the file in nano.
Write the script: Copy and paste the same code as above into the nano editor.
Save the file: Press Ctrl + O, then Enter to save, followed by Ctrl + X to exit nano.

#!/bin/bash

# Replace Bot List with the ones you want to Search for
bots="Amazonbot|anthropic-ai|Applebot|Bytespider|CCBot|ChatGPT-User|Claude-Web|Claudebot|cohere-ai|Diffbot|FacebookBot|Google-Extended|GPTBot|ImagesiftBot|Meltwater|Omgili|Omgilibot|PerplexityBot|Seekr|YouBot"

# Replace File list with the log files
log_files=("apache.log")

for bot in $(echo $bots | tr "|" "\n")
do
    total_count=0
    for log_file in "${log_files[@]}"
    do
        count=$(grep -c "$bot" "$log_file")
        total_count=$((total_count + count))
        echo "$bot: $count" >> bot_counts.log
    done
done

Making the Script Executable

In the terminal, navigate to the directory containing your script file.
Run the command chmod +x bot_counter.sh to make it executable. Replace bot_counter.sh with your script’s filename.

Running the Script

Execute the script by typing ./bot_counter.sh in the terminal. Ensure you are in the same directory as the script and Apache log file.

The script will process apache.log and produce bot_counts.csv, which lists each bot and the number of times it accessed your site.

How to Count AI Scraping Visits in Apache Logs with an awk Command

Another alternative to using a bash script is to use the command awk, below is the process to use awk that counts visits from various bots in an Apache log file. This method is efficient for web admins looking to analyze AI Scraping traffic using a single command. We found this method effective but slower to process than the bash script.

Prerequisites

A Unix-like operating system (Linux, macOS)
An Apache log file (commonly named apache.log)
Basic knowledge of using the terminal

Running the awk Command

Open your terminal and run the following awk command:

% awk '
BEGIN {
    # Initialize the count for each bot to 0
    botAgents["Amazonbot"]=0
    botAgents["anthropic-ai"]=0
    botAgents["Applebot"]=0
    botAgents["Bytespider"]=0
    botAgents["CCBot"]=0
    botAgents["ChatGPT-User"]=0
    botAgents["Claude-Web"]=0
    botAgents["Claudebot"]=0
    botAgents["cohere-ai"]=0
    botAgents["Diffbot"]=0
    botAgents["FacebookBot"]=0
    botAgents["Google-Extended"]=0
    botAgents["GPTBot"]=0
    botAgents["ImagesiftBot"]=0
    botAgents["Meltwater"]=0
    botAgents["Omgili"]=0
    botAgents["Omgilibot"]=0
    botAgents["PerplexityBot"]=0
    botAgents["Seekr"]=0
    botAgents["YouBot"]=0
}
{
    for (bot in botAgents) {
        if ($0 ~ bot) {
            botAgents[bot]++
        }
    }
}
END {
    for (bot in botAgents) {
        print bot, botAgents[bot]
    }
}
' apache.log | sort -f | awk '{print $1 ": " $2}' > bot_counts_awk.log

This awk command will:

Split the bot list into an array.
Initialize counts for each bot.
Check each line in the log file for bot names and update the counts.
Write the results to bot_counts.log.

The command will process apache.log and produce bot_counts.log, which lists each bot and the number of times it accessed your site.

What is grep?

grep is a powerful command-line utility used in Unix-like operating systems for searching text using patterns. When used with regular expressions (-E flag), grep becomes even more versatile, allowing you to match complex patterns.

What is awk?

awk is a powerful command-line utility used in Unix-like operating systems for pattern scanning and processing. It allows you to search, filter, and manipulate text based on defined patterns. awk is particularly useful for processing structured data, such as log files or CSV files, and can perform complex text transformations and reporting.

What are User Agents?

In the robots.txt file, the user agents identify specific web crawlers or bots, allowing site administrators to tailor access permissions individually. By specifying user agents, one can selectively restrict or grant access to different parts of a website, ensuring that only desired bots can index or interact with specific content.

Take Action Against Unwanted AI Traffic

Now that you can see exactly which AI bots are hitting your site, and how often. It’s time to decide what to do about it. If your log analysis reveals scrapers you’d rather keep out, you can block them at the front door with properly configured robots.txt rules. We maintain up-to-date code snippets with directives for all major AI crawlers and scrapers, ready to drop into your site.

Grab it here: Code to Block Models with AI Robots.txt Rules.

(Last Updated: February 25, 2026)

DOWNLOAD THE B2B WORDPRESS SECURITY CHECKLIST

Read Other Technical Snippets

All Technical Snippets

[email protected]

Need help figuring out whether AI is scraping your website?

Finding AI Bots in Your Web Server Log Files

Extract all of the lines of AI Bots from Server Logs with grep

Prerequisites

Instructions

How to Count AI Scraping Visits in Apache Logs with a Shell Script

Prerequisites

Creating the Script

Using a Graphical Text Editor:

Or using Nano (Command-Line Text Editor):

Making the Script Executable

Running the Script

How to Count AI Scraping Visits in Apache Logs with an awk Command

Prerequisites

Running the awk Command

What is grep?

What is awk?

What are User Agents?

Take Action Against Unwanted AI Traffic

DOWNLOAD THE B2B WORDPRESS SECURITY CHECKLIST

Read Other Technical Snippets

BuddyBoss App Mobile View: How It Works and Why It Breaks Your Templates

Export Every WordPress Taxonomy Term to CSV with WP-CLI

Your Block Notes Disappeared After Import. Here’s the Fix.

[email protected]

Finding AI Bots in Your Web Server Log Files

Extract all of the lines of AI Bots from Server Logs with grep

Prerequisites

Instructions

How to Count AI Scraping Visits in Apache Logs with a Shell Script

Prerequisites

Creating the Script

Using a Graphical Text Editor:

Or using Nano (Command-Line Text Editor):

Making the Script Executable

Running the Script

How to Count AI Scraping Visits in Apache Logs with an awk Command

Prerequisites

Running the awk Command

What is grep?

What is awk?

What are User Agents?

Take Action Against Unwanted AI Traffic

DOWNLOAD THE B2B WORDPRESS SECURITY CHECKLIST

Read Other Technical Snippets

buddyboss-featured

BuddyBoss App Mobile View: How It Works and Why It Breaks Your Templates

export-wordpress-taxonomies

Export Every WordPress Taxonomy Term to CSV with WP-CLI

robot-block-sorting

Your Block Notes Disappeared After Import. Here’s the Fix.

[email protected]