Link checker » History » Revision 8
Revision 7 (Zhoujie Li, 30.10.2023 15:43) → Revision 8/11 (Zhoujie Li, 30.10.2023 15:45)
h1. Link checker h2. Introduction The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image. h2. Script destination script location: extension/Resources/Public/scripts script Name: web_crawler.py h2. Usage h3. Configuration Before using the Link Checker script, you *need* a configuration file. It looks like this !clipboard-202309040834-symyc.png! * "startUrl": The URL where the link checking will begin. * "login_url": URL for logging in if required. If empty it will use the "startUrl" instead. * "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !* * "max_depth": The maximum depth to crawl links. * "target_path": The path to restrict link checking (e.g., /blog). * "target_string": Looking for a unique string. * "blacklist": URLs to exclude from checking. h3. Ignore CSS class This script also ignore the CSS class "link-checker-skip" h3. Running the Script You can run the Link Checker script using the following command: !clipboard-202309051340-pj0ak.png! <pre> python web_crawler.py conj.json "all or <index>" </pre> h2. Result/Output It generate detailed reports. These reports include: * Broken links and images with response codes. * Denied links with 403 Forbidden errors. * Redirects to the home page. * Successfully checked links. * The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links. h3. Summary log: *0 error* <pre> !clipboard-202309051417-p5lf8.png! </pre> *1 or more error* <pre> !clipboard-202309051415-bmmau.png! </pre> h3. Detail log: !clipboard-202309051423-xa8x4.png!