Project

General

Profile

Link checker » History » Revision 10

Revision 9 (Zhoujie Li, 30.10.2023 15:46) → Revision 10/11 (Zhoujie Li, 22.01.2024 16:34)

h1. Link checker 

 h2. Introduction 

 The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image. 

 h2. Script destination 

 script location: extension/Resources/Public/scripts 
 script Name: web_crawler.py 

 h2. Usage 

 h3. Configuration 

 Before using the Link Checker script, you *need* a configuration file "conf.json". file. 
 It looks like this  
 !clipboard-202309040834-symyc.png! 

 * "startUrl": The URL where the link checking will begin. 
 * "login_url": URL for logging in if required. If empty it will use the "startUrl" instead. 
 * "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !* 
 * "max_depth": The maximum depth to crawl links. 
 * "target_path": The path to restrict link checking (e.g., /blog). 
 * "target_string": Looking for a unique string. 
 * "blacklist": URLs to exclude from checking. 

 h3. Ignore CSS class 

 This script also ignore the CSS class "link-checker-skip" 

 h3. Running the Script 

 You can run the Link Checker script using the following command: 
 !clipboard-202309051340-pj0ak.png! 
 <pre> 
 python web_crawler.py conj.json "all or <index>" 
 </pre> 

 h2. Result/Output 

 It generate detailed reports. These reports include: 
 * Broken links and images with response codes. 
 * Denied links with 403 Forbidden errors. 
 * Redirects to the home page. 
 * Successfully checked links. 
 * The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links. 

 h3. Summary log: 

 *0 error* 

 !clipboard-202309051417-p5lf8.png! 


 *1 or more error* 

 !clipboard-202309051415-bmmau.png! 


 h3. Detail log: 

 !clipboard-202309051423-xa8x4.png!