Link checker¶

Introduction¶

The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image.

Script destination¶

script location: extension/Resources/Public/scripts
script Name: web_crawler.py

Usage¶

Configuration¶

Before using the Link Checker script, you need a configuration file "conf.json".

"startUrl": The URL where the link checking will begin.
"login_url": URL for logging in if required. If empty it will use the "startUrl" instead.
"username and password": Login credentials. If you don't have login credentials, leave this field empty it's important !
"max_depth": The maximum depth to crawl links.
"target_path": The path to restrict link checking (e.g., /blog).
"target_string": Looking for a unique string.
"blacklist": URLs to exclude from checking.

Ignore CSS class¶

This script also ignore the CSS class "link-checker-skip"

Running the Script¶

You can run the Link Checker script using the following command:

python web_crawler.py conj.json "all or <index>"

Result/Output¶

It generate detailed reports. These reports include:

Broken links and images with response codes.
Denied links with 403 Forbidden errors.
Redirects to the home page.
Successfully checked links.
The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links.

Summary log:¶

0 error

1 or more error

Detail log:¶

Files (7)

Updated by Zhoujie Li 5 months ago · 11 revisions

clipboard-202309040834-symyc.png	View clipboard-202309040834-symyc.png	25.1 KB	Zhoujie Li, 04.09.2023 08:34
clipboard-202309051338-98mfg.png	View clipboard-202309051338-98mfg.png	3.59 KB	Zhoujie Li, 05.09.2023 13:38
clipboard-202309051340-pj0ak.png	View clipboard-202309051340-pj0ak.png	4.01 KB	Zhoujie Li, 05.09.2023 13:40
clipboard-202309051413-dobwb.png	View clipboard-202309051413-dobwb.png	19.5 KB	Zhoujie Li, 05.09.2023 14:13
clipboard-202309051415-bmmau.png	View clipboard-202309051415-bmmau.png	21.5 KB	Zhoujie Li, 05.09.2023 14:15
clipboard-202309051417-p5lf8.png	View clipboard-202309051417-p5lf8.png	19.4 KB	Zhoujie Li, 05.09.2023 14:17
clipboard-202309051423-xa8x4.png	View clipboard-202309051423-xa8x4.png	38.7 KB	Zhoujie Li, 05.09.2023 14:23

Project

General

Profile

QFQ

Wiki