Link checker » History » Version 4
Zhoujie Li, 04.09.2023 09:18
1 | 1 | Zhoujie Li | h1. Link checker |
---|---|---|---|
2 | 2 | Zhoujie Li | |
3 | 3 | Zhoujie Li | h2. Introduction |
4 | |||
5 | The Link Checker script is to help you identify and manage broken links and images on a website. It automates the process of checking URLs within a specified domain and provides detailed reports on the status of each link and image. |
||
6 | |||
7 | |||
8 | 4 | Zhoujie Li | h2. Script destination |
9 | 3 | Zhoujie Li | |
10 | 2 | Zhoujie Li | script location: extension/Resources/Public/scripts |
11 | script Name: web_crawler.py |
||
12 | 3 | Zhoujie Li | scropt Name: web_crawler_dev.py |
13 | |||
14 | h2. Usage |
||
15 | |||
16 | h3. Configuration |
||
17 | |||
18 | Before using the Link Checker script, you *need* a configuration file. |
||
19 | It looks like this |
||
20 | !clipboard-202309040834-symyc.png! |
||
21 | |||
22 | * "startUrl": The URL where the link checking will begin. |
||
23 | * "login_url": URL for logging in if required. If empty it will use the "startUrl" instead. |
||
24 | * "username and password": Login credentials. If you don't have login credentials, leave this field *empty* it's *important !* |
||
25 | * "max_depth": The maximum depth to crawl links. |
||
26 | * "target_path": The path to restrict link checking (e.g., /blog). |
||
27 | * "target_string": Looking for a unique string. |
||
28 | * "blacklist": URLs to exclude from checking. |
||
29 | |||
30 | h3. Running the Script |
||
31 | |||
32 | You can run the Link Checker script using the following command: |