To cancel
This script scans the pages of a website, validates each URL, and replaces the URL with a new one when redirected.
This is also suitable for sites that switch from HTTP to HTTPS, it updates links both on the site itself and on all other related sites.
It also displays broken links and, for static nodes, replaces a link testing tool such as Link Checker on that site.
Code
The program uses the DOMDocument PHP class to find links in <a> tags or images. But it also uses the file_get_contents () function to load the file as raw text.
The program uses Curl to check if the link is being redirected and then find the final redirection address.
The str_replace function is used to replace redirected URLs (not setAttribute). The content is then saved using the file_put_contents ().
Using these alternative functions avoids using the saveHTMLile method, which attempts to recover HTML content before saving the file. Because after that, tags are added that may already be in the php file.
Redirection test PHP code:
function redirected($url)
{
$hcurl=curl_init();
curl_setopt($hcurl, CURLOPT_CONNECTTIMEOUT, 300);
curl_setopt($hcurl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($hcurl, CURLOPT_VERBOSE, false);
curl_setopt($hcurl, CURLOPT_URL, $url);
curl_setopt($hcurl, CURLOPT_HEADER, true);
curl_setopt($hcurl, CURLOPT_NOBODY, true);
curl_setopt($hcurl, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($hcurl, CURLOPT_SSL_VERIFYPEER, false);
$headers = curl_exec($hcurl);
$code = curl_getinfo($hcurl, CURLINFO_HTTP_CODE);
if($code!=301)
{
curl_close($hcurl);
return "";
}
curl_setopt($hcurl, CURLOPT_FOLLOWLOCATION, true);
$headers = curl_exec($hcurl);
$newurl = curl_getinfo($hcurl, CURLINFO_EFFECTIVE_URL);
$code = curl_getinfo($hcurl, CURLINFO_HTTP_CODE);
curl_close($hcurl);
if($code!=200)
{
return "";
}
return $newurl;
}
Operation manual
Open the command line console, navigate to the directory containing the site pages to be updated. Enter:
php c:/unredir/unredir.php [options]
In the command, replace the above directory with the one where you installed unresh.
There are two options:
-t test, check the result without changing the files.
-v, view all scanned pages.
Loading
Versions:
- March 24, 2021: Broken ties count added .
See also...
Converting HTTP to HTTPS. This script replaces http with https for a specific domain. It complements it as it also changes the references in the text. But it only considers redirects for the specified domain.