Link checker » History » Revision 7
Revision 6 (Zhoujie Li, 05.09.2023 14:28) → Revision 7/11 (Zhoujie Li, 30.10.2023 15:43)
h1. Link checker
h2. Introduction
The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image.
h2. Script destination
script location: extension/Resources/Public/scripts
script Name: web_crawler.py
h2. Usage
h3. Configuration
Before using the Link Checker script, you *need* a configuration file.
It looks like this
!clipboard-202309040834-symyc.png!
* "startUrl": The URL where the link checking will begin.
* "login_url": URL for logging in if required. If empty it will use the "startUrl" instead.
* "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !*
* "max_depth": The maximum depth to crawl links.
* "target_path": The path to restrict link checking (e.g., /blog).
* "target_string": Looking for a unique string.
* "blacklist": URLs to exclude from checking.
h3. Ignore CSS class
This script also ignore the CSS class "link-checker-skip"
h3. Running the Script
You can run the Link Checker script using the following command:
!clipboard-202309051340-pj0ak.png!
<pre>
python web_crawler.py conj.json "all or <index>"
</pre>
h2. Result/Output
It generate detailed reports. These reports include:
* Broken links and images with response codes.
* Denied links with 403 Forbidden errors.
* Redirects to the home page.
* Successfully checked links.
* The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links.
h3. Summary log:
*0 error*
!clipboard-202309051417-p5lf8.png!
*1 or more error*
!clipboard-202309051415-bmmau.png!
h3. Detail log:
!clipboard-202309051423-xa8x4.png!