Link checker » History » Version 8
Zhoujie Li, 30.10.2023 15:45
1 | 1 | Zhoujie Li | h1. Link checker |
---|---|---|---|
2 | 2 | Zhoujie Li | |
3 | 3 | Zhoujie Li | h2. Introduction |
4 | |||
5 | The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image. |
||
6 | |||
7 | 4 | Zhoujie Li | h2. Script destination |
8 | 3 | Zhoujie Li | |
9 | 2 | Zhoujie Li | script location: extension/Resources/Public/scripts |
10 | script Name: web_crawler.py |
||
11 | 3 | Zhoujie Li | |
12 | h2. Usage |
||
13 | |||
14 | h3. Configuration |
||
15 | |||
16 | Before using the Link Checker script, you *need* a configuration file. |
||
17 | It looks like this |
||
18 | !clipboard-202309040834-symyc.png! |
||
19 | |||
20 | * "startUrl": The URL where the link checking will begin. |
||
21 | * "login_url": URL for logging in if required. If empty it will use the "startUrl" instead. |
||
22 | * "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !* |
||
23 | * "max_depth": The maximum depth to crawl links. |
||
24 | * "target_path": The path to restrict link checking (e.g., /blog). |
||
25 | * "target_string": Looking for a unique string. |
||
26 | 1 | Zhoujie Li | * "blacklist": URLs to exclude from checking. |
27 | |||
28 | 5 | Zhoujie Li | h3. Ignore CSS class |
29 | 6 | Zhoujie Li | |
30 | 5 | Zhoujie Li | This script also ignore the CSS class "link-checker-skip" |
31 | |||
32 | 3 | Zhoujie Li | h3. Running the Script |
33 | 1 | Zhoujie Li | |
34 | You can run the Link Checker script using the following command: |
||
35 | 5 | Zhoujie Li | !clipboard-202309051340-pj0ak.png! |
36 | 7 | Zhoujie Li | <pre> |
37 | python web_crawler.py conj.json "all or <index>" |
||
38 | </pre> |
||
39 | 5 | Zhoujie Li | |
40 | h2. Result/Output |
||
41 | |||
42 | It generate detailed reports. These reports include: |
||
43 | * Broken links and images with response codes. |
||
44 | * Denied links with 403 Forbidden errors. |
||
45 | * Redirects to the home page. |
||
46 | * Successfully checked links. |
||
47 | * The results will be saved in log files (detail.log and summary.log) and a CSV file containing broken links. |
||
48 | |||
49 | h3. Summary log: |
||
50 | |||
51 | *0 error* |
||
52 | 8 | Zhoujie Li | <pre> |
53 | 5 | Zhoujie Li | !clipboard-202309051417-p5lf8.png! |
54 | 8 | Zhoujie Li | </pre> |
55 | 5 | Zhoujie Li | |
56 | 8 | Zhoujie Li | |
57 | 5 | Zhoujie Li | *1 or more error* |
58 | 8 | Zhoujie Li | <pre> |
59 | 5 | Zhoujie Li | !clipboard-202309051415-bmmau.png! |
60 | 8 | Zhoujie Li | </pre> |
61 | 5 | Zhoujie Li | |
62 | h3. Detail log: |
||
63 | |||
64 | !clipboard-202309051423-xa8x4.png! |
||
65 |