Link checker » History » Version 5
Zhoujie Li, 05.09.2023 14:27
1 | 1 | Zhoujie Li | h1. Link checker |
---|---|---|---|
2 | 2 | Zhoujie Li | |
3 | 3 | Zhoujie Li | h2. Introduction |
4 | |||
5 | The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image. |
||
6 | |||
7 | 4 | Zhoujie Li | h2. Script destination |
8 | 3 | Zhoujie Li | |
9 | 2 | Zhoujie Li | script location: extension/Resources/Public/scripts |
10 | script Name: web_crawler.py |
||
11 | 3 | Zhoujie Li | |
12 | h2. Usage |
||
13 | |||
14 | h3. Configuration |
||
15 | |||
16 | Before using the Link Checker script, you *need* a configuration file. |
||
17 | It looks like this |
||
18 | !clipboard-202309040834-symyc.png! |
||
19 | |||
20 | * "startUrl": The URL where the link checking will begin. |
||
21 | * "login_url": URL for logging in if required. If empty it will use the "startUrl" instead. |
||
22 | * "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !* |
||
23 | * "max_depth": The maximum depth to crawl links. |
||
24 | * "target_path": The path to restrict link checking (e.g., /blog). |
||
25 | * "target_string": Looking for a unique string. |
||
26 | 1 | Zhoujie Li | * "blacklist": URLs to exclude from checking. |
27 | |||
28 | 5 | Zhoujie Li | h3. Ignore CSS class |
29 | This script also ignore the CSS class "link-checker-skip" |
||
30 | |||
31 | 3 | Zhoujie Li | h3. Running the Script |
32 | 1 | Zhoujie Li | |
33 | You can run the Link Checker script using the following command: |
||
34 | 5 | Zhoujie Li | !clipboard-202309051340-pj0ak.png! |
35 | |||
36 | h2. Result/Output |
||
37 | |||
38 | It generate detailed reports. These reports include: |
||
39 | * Broken links and images with response codes. |
||
40 | * Denied links with 403 Forbidden errors. |
||
41 | * Redirects to the home page. |
||
42 | * Successfully checked links. |
||
43 | * The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links. |
||
44 | |||
45 | h3. Summary log: |
||
46 | |||
47 | *0 error* |
||
48 | !clipboard-202309051417-p5lf8.png! |
||
49 | |||
50 | *1 or more error* |
||
51 | !clipboard-202309051415-bmmau.png! |
||
52 | |||
53 | h3. Detail log: |
||
54 | |||
55 | !clipboard-202309051423-xa8x4.png! |
||
56 |