Sitewide search/replace to change root-relative links to document-relative links

280 views
Skip to first unread message

BBunny

unread,
May 29, 2011, 3:06:40 AM5/29/11
to bbe...@googlegroups.com
Can anyone suggest a grep search/replace or other technique for changing root-relative links of a Web site to document-relative? The site is site-root is identified in BBEdit's preferences as a Web site. There's a utility written to do this for a very early version of Dreamweaver (it will also change all document-relative links to root-relative, if that's desired, which it isn't), and it works, but only on the open document. The sitewide feature doesn't work. My site has nearly 3,000 html files, so opening each document and running the script would be extremely time-consuming, and the author is not going to update the script. I do not know javascript and I'm not that savvy with regular expressions either, so I would appreciate any suggestions for running a script like this sitewide in BBEdit.

Thanks.

For reference, the Dreamweaver utility is supposed to use this regex to select the files sitewide:

.+\.html?$


And it uses this script (which works, but only in the open document):


/*******************************************/

/* Change links to document/root relative. */
/* Version 1.5.1 (28 Dec 2005) */
/* Jason Dalgarno <ja...@e7x.com> */
/*******************************************/
/*
* Copyright (C) 2005 Jason Dalgarno <ja...@e7x.com>
*
* This library is free software; you can redistribute it and/or modify it under
* the terms of the GNU Lesser General Public License as published by the Free
* Software Foundation; either version 2.1 of the License, or (at your option)
* any later version.
*
* This library is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
* details. http://www.gnu.org/licenses/lgpl.txt
*/
function canAcceptCommand()
{
var sr, doc;
sr = dreamweaver.getSiteRoot();
doc = dreamweaver.getDocumentDOM('document');
/*
* If a site is defined, the document has been saved, the document is saved
* in a defined site and the document is HTML.
*/
return (sr && doc && doc.URL && doc.URL.indexOf(sr) == 0 && fileIsHTML(doc) && !hasBaseHref(doc)) ? true : false;
}
function commandButtons()
{
return Array('Document relative', 'initRelativity("document", "document")',
'Root relative', 'initRelativity("document", "root")',
'Absolute (http://...)', 'initRelativity("document", "absolute")',
'Sitewide - Document relative', 'initRelativity("site", "document")',
'Sitewide - Root relative', 'initRelativity("site", "root")',
'Get latest version', 'fetchLatest()',
'Cancel', 'window.close()',
'Help', 'relativityHelp()');
}
function relativityHelp()
{
dreamweaver.browseDocument(dreamweaver.getConfigurationPath() + '/Shared/JSN/relativity/help.htm');
}
/**
* Check if the current document/file is a HTML document
*
* @param object Document
* @return boolean True is file is an html document, false if not.
*/
function fileIsHTML(doc)
{
/*
* .search(/<html/i) is to allow for <HTML> in uppercase.
*
* Relativity cannot work on a partial document that is included elsewhere
* by SSI or PHP (for example) as it will not be able to resolve the base
* URL correctly.
*/
return doc.getParseMode() == 'html' && doc.documentElement.outerHTML.search(/<html/i) != -1;
}
/**
* If a file has a <base href="..." /> do not modify links
*
* @param object Document
* @return boolean True if file has base href, false if we can safely modify it
*/
function hasBaseHref(doc)
{
var base, i, ret = false;
base = doc.getElementsByTagName('BASE');
for (i = 0; i < base.length; ++i) {
if (base[i].href) {
ret = true;
break;
}
}
return ret;
}
/**
* Read updater file from remote server and check version numbers
*/
function fetchLatest()
{
/*
* remote file goes like this:
* version=1.2.3
* url=http://www.example.com/dwextension.mxp
* filename=filename_for_local_file.ext
* title=Text for download dialog title
* beforemessage=Some message to show before downloading.
* donemessage=Message to show after file has been downloaded.
*/
var currentVersion, updaterUrl, versionFile, latest, pairs, i, params, confirmMsg, gotUpdate;
currentVersion = '1.5.1';
updaterUrl = 'http://www.microwaved.plus.com/dw/relativityupdate.txt';
versionFile = MMHttp.getText(updaterUrl);
if (versionFile.statusCode == 200) {
latest = new Array;
pairs = versionFile.data.replace(/\r\n?/, '\n').split('\n');
for (i = 0; i < pairs.length; ++i) {
params = pairs[i].split('=');
if (params.length == 2) {
latest[params[0]] = params[1].replace(/\s+$/, '');
}
}
if (latest.version && latest.url && latest.filename && latest.title
&& latest.beforemessage && latest.donemessage) {
latest.beforemessage = latest.beforemessage.replace(/\\n/g, '\n');
latest.donemessage = latest.donemessage.replace(/\\n/g, '\n');
if (latest.version == currentVersion) {
alert('No update available');
} else {
confirmMsg = 'Update available\n\n';
confirmMsg += 'Latest version is ' + latest.version;
confirmMsg += ' (currently using ' + currentVersion + ')\n\n';
confirmMsg += latest.beforemessage + '\n\n';
confirmMsg += 'Download the update?';
if (confirm(confirmMsg)
&& (gotUpdate = MMHttp.getFile(latest.url, true, 'file:///C|/' + latest.filename,
'Download update - ' + latest.title)) && gotUpdate.statusCode) {
if (gotUpdate.statusCode == 200) {
alert('Download complete.\n\n' + latest.donemessage);
} else {
alert('Error retrieving file.')
}
}
}
} else {
alert('Updater file does not follow expected format. Got this instead:\n\n' + pairs.join('\n'));
}
} else {
// 404, host not found, not online etc
alert('Updater file not found');
}
}
/**
* Change link relativity
*
* @param string Relativity direction, 'document' or 'root'
*/
function Relativity(relativeTo)
{
var doc, ignore, ignore2, tagAttrs, siteRoot, siteUrl;
/**
* Read tag name and attributes file
*/
function readTagAttributes()
{ // Read tag name and attributes file
var fileContents, i, tagAttribs;
fileContents = DWfile.read(dreamweaver.getConfigurationPath()
+ '/Shared/jsn/relativity/uris.txt').replace(/\r\n?/g, '\n').split(/\n/);
for (i = 0; i < fileContents.length; ++i) {
tagAttribs = fileContents[i].split(' ');
tagAttrs[tagAttribs[0]] = tagAttribs.slice(1);
}
}
/**
* getElementsByTagName without translated elements
*
* There are two problems with the regular getElementsByTagName
* implementation:
* 1) Translated elements are included, but DW will choke if you try to
* modify them.
* 2) Elements with a name or id attribute beginning with a number are
* inserted a second time in the array with its name as the key,
* causing the array to be longer than it should be with empty elements
* along the way. For example, if the only <a> tag in a document is
* <a name="100"></a>, doc.getElementsByTagName('A').length is 101.
*
* @param object Partial document
* @param string Tag name
* @return array Non-translated elements
*/
function getElements(docPart, tag)
{
var eles, cleaned, i, offsets, hasTranslatedContent;
eles = docPart.getElementsByTagName(tag);
hasTranslatedContent = docPart.getElementsByTagName('MM:BEGINLOCK').length ? true : false;
cleaned = new Array;
for (i = 0; i < eles.length && eles[i]; ++i) {
if (hasTranslatedContent) {
/*
* translated element offsets refer to translated source,
* offsetsToNode refers to actual source, if they don't match
* the element is translated
*/
offsets = doc.nodeToOffsets(eles[i]);
if (doc.offsetsToNode(offsets[0], offsets[1]) == eles[i]) {
cleaned.push(eles[i]);
}
} else {
cleaned.push(eles[i]);
}
}
return cleaned;
}
/**
* Get root relative path
*
* @param string Document relative path
* @return string Root relative path
*/
function toRoot(path)
{
var trailSlash, dotSlash;
if (path.substr(0, 2) == './') { // clean ./ paths
if (path.length > 2) {
path = path.substr(2);
} else {
path = 'PATHWASDOTSLASH'; // placeholder if path was just ./
dotSlash = true;
}
}
// remember if original path had trailing slash
trailSlash = (RegExp('/$').test(path)) ? '/' : '';
// dreamweaver.relativeToAbsoluteURL does not work as documented
path = dreamweaver.relativeToAbsoluteURL(doc.URL, siteRoot, path);
// remove everything before site root, -1 to retain leading slash
path = path.substr(siteRoot.length - 1);
path += trailSlash; // replace trailing slash if needed
if (dotSlash) {
// remove ./ placeholder
path = path.replace(/PATHWASDOTSLASH$/, '');
}
return path;
}
/**
* Get document relative path
*
* @param string Root relative path
* @return string Document relative path
*/
function toDoc(path)
{
var up, pathbits, filebits, s, i;
// build arrays of path and document url folder structure
pathbits = path.substr(1).split('/');
// remove file:// path to site root
filebits = doc.URL.substr(siteRoot.length).split('/');
// number of levels to go up to get back to site root
up = filebits.length - 1;
s = 0; // path slice counter
for (i = 0; i < filebits.length; ++i) {
if (up == 0 && filebits[i] == pathbits[i]) {
path = pathbits[0];
} else if (filebits[i] == pathbits[i]) {
/*
* if filebit is the same as pathbit it can be discarded
* increment s to remove bit from the path decrement up as it's
* one less level to go back up the directory tree
*/
++s;
--up;
} else {
/*
* taking a different path, add the required number of up ../ to
* the begining of the path then take the remaining bits of the
* path that are different and join them back together.
*/
path = '';
for (i = 0; i < up; ++i) {
path += '../';
}
path += pathbits.slice(s).join('/');
break; // stop checking
}
}
if (!path) {
/*
* original path was to the current directory, filebits and pathbits
* all matched no bits remain, no levels to go up
*/
path = './';
}
return path;
}
/**
* Run command on document
*
* @param string Source document ('document' or file:// path)
* @param boolean Save document (true if doing sitewide, false otherwise)
* @return string ok - file was modified
* savefailed - unable to save
* nothtml - the file was not a HTML document.
* basehref - file has base href, not modified
*/
this.useDoc = function(sourceDoc, save)
{
var tag, attrs, eles, j, k, attr;
doc = dreamweaver.getDocumentDOM(sourceDoc);
if (!fileIsHTML(doc)) {
ret = 'nothtml';
} else if (hasBaseHref(doc)) {
ret = 'basehref';
} else {
if (doc.getAttachedTemplate()) {
editable = doc.getEditableRegionList();
} else {
editable = Array(doc.documentElement);
}
for (i = 0; i < editable.length; ++i) {
for (tag in tagAttrs) {
attrs = tagAttrs[tag];
eles = getElements(editable[i], tag);
for (j = 0; j < eles.length; ++j) {
for (k = 0; k < attrs.length; ++k) {
attr = attrs[k];
path = eles[j].getAttribute(attr);
if (path) {
if (siteUrl && relativeTo != 'absolute'
&& path.substr(0, siteUrl.length) == siteUrl) {
/*
* If it is an absolute http:// URL to the
* current site reduce it to a root relative
* path.
*/
path = path.substr(siteUrl.length);
if (path == '') {
path = '/';
}
if (relativeTo == 'root') {
eles[j].setAttribute(attr, path);
}
} else if (path.substr(0, 7) == 'file://') {
/*
* If we have a file:/// URL pointing to
* local files reduce it to a root relative
* path.
*/
if (path.substr(0, siteRoot.length) == siteRoot) {
path = path.substr(siteRoot.length);
} else if (path.substr(0, escapedSiteRoot.length) == escapedSiteRoot) {
path = path.substr(escapedSiteRoot.length);
}
if (path == '') {
path = '/';
}
eles[j].setAttribute(attr, path);
}
switch (relativeTo) {
case 'document':
if (path.charAt(0) == '/') {
eles[j].setAttribute(attr, toDoc(path));
}
break;
case 'root':
if (!ignore.test(path)) {
eles[j].setAttribute(attr, toRoot(path));
}
break;
case 'absolute':
if (!ignore2.test(path)) {
if (path.charAt(0) != '/') {
path = toRoot(path);
}
eles[j].setAttribute(attr, siteUrl + path);
}
break;
}
}
}
}
}
}
if (!save) {
ret = 'ok';
window.close();
} else if (DWfile.write(doc.URL, doc.documentElement.outerHTML)) {
ret = 'ok';
} else {
ret = 'savefailed';
}
}
if (save) {
dreamweaver.releaseDocument(doc);
}
return ret;
}
/**
* Pass all files matching regex to function
*
* @param object Regular expression
*/
this.sitewide = function(filesRegex)
{
var logFile, dirs, dir, files, i, filename, rv, subDirs;
// debug - uncomment the following line to log what happens file://localhost/Users/frances_cherman/Desktop
// logFile = 'file:///~/Desktop/relativitylog.txt';
logFile = 'file:///localhost/Users/frances_cherman/Desktop';
dirs = Array(siteRoot);
while (dirs.length) {
dir = dirs.pop();
if (logFile) {
DWfile.write(logFile, 'Directory: ' + dir + '\n', 'append');
}
files = DWfile.listFolder(dir + '/*', 'files'); // list all files
for (i = 0; i < files.length; ++i) {
filename = dir + '/' + files[i];
if (filesRegex.test(files[i])) {
DWfile.write(logFile, 'Starting: ' + filename + '\n', 'append');
rv = this.useDoc(filename, true);
if (logFile) {
if (rv == 'ok') {
DWfile.write(logFile, 'Done: ' + filename + '\n', 'append');
} else if (rv == 'nothtml') {
DWfile.write(logFile, 'Not HTML: ' + filename + '\n', 'append');
} else if (rv == 'basehref') {
DWfile.write(logFile, 'Has base href: ' + filename + '\n', 'append');
} else if (rv == 'savefailed') {
DWfile.write(logFile, 'Save failed: ' + filename + '\n', 'append');
}
}
} else if (logFile) {
DWfile.write(logFile, 'No match: ' + filename + '\n', 'append');
}
}
subDirs = DWfile.listFolder(dir, 'directories');
for (i = 0; i < subDirs.length; ++i) {
dirs.push(dir + '/' + subDirs[i]);
}
}
}
/**
* @return string file:// path to site root
*/
this.getSiteRoot = function()
{
return siteRoot;
}
// Regular expression of links to ignore
ignore = /^(#|\/|[a-z][a-z\d\+\-\.]*:|<)/i;
ignore2 = /^(#|[a-z][a-z\d\+\-\.]*:|<)/i;
/*
* Don't modify if link begins with either
* # anchor
* / root relative, ie doesn't need changing
* [a-z][a-z\d\+\-\.]*: RFC 2396 scheme, external link
* < server markup, only ignore if at begining of link, otherwise what goes
* before is path, eg don't touch <?php echo $dir;?>file.html but do
* change file.html?foo=<?php echo $foo; ?>
*
* ignore2 for absolute, as above but without /
*/
tagAttrs = new Array(); // Tags and attributes to check
siteRoot = dreamweaver.getSiteRoot(); // Site root directory
escapedSiteRoot = escape(siteRoot);
siteUrl = document.forms[0].siteurl.value;
readTagAttributes();
}
/**
* Start here
*
* @param string 'document' - Change current document only
* 'site' - Change entire site
* @param string Relativity direction, 'document', 'root' or 'absolute'
*/
function initRelativity(mode, relativeTo)
{
var siteUrl, validUrl, cfgFile, rel;
siteUrl = document.forms[0].siteurl.value;
validUrl = /^https?:\/\/[a-zA-Z0-9.-]+(:\d{1,5})?$/;
if (siteUrl != '' && !validUrl.test(siteUrl)) {
document.forms[0].siteurl.value = '';
alert('Ignoring invalid site URL ' + siteUrl);
}
// remember files regex and site url for next time
cfgFile = MMNotes.open(dreamweaver.getConfigurationPath()
+ '/Shared/jsn/relativity/relativity.htm', true);
MMNotes.set(cfgFile, 'siteurl', document.forms[0].siteurl.value);
MMNotes.set(cfgFile, 'filesregex', document.forms[0].filesregex.value);
MMNotes.close(cfgFile);
rel = new Relativity(relativeTo);
if (mode == 'document') {
rel.useDoc('document', false);
} else {
var siteroot, filesRegex, confirmStr;
// remove trailing slash
siteroot = rel.getSiteRoot().replace(/\/$/, '');
filesRegex = RegExp(document.forms[0].filesregex.value);
confirmStr = 'Are you sure you want change links to ' + relativeTo + ' relative in files in\n';
confirmStr += siteroot + '\n';
confirmStr += 'Matching the regular expression ' + filesRegex + '\n\n';
confirmStr += 'All open files should be saved before continuing.\n';
confirmStr += 'If you do not have a backup of your entire site hit cancel.\n\n';
confirmStr += 'This may take some time.';
if (siteroot && DWfile.exists(siteroot) && filesRegex && confirm(confirmStr)) {
MM.setBusyCursor();
rel.sitewide(filesRegex);
MM.clearBusyCursor();
alert('Change link relativity finished');
/*
* You don't get a list of files on the alert anymore (as of 1.1.1),
* it was beyond useless for all but the smallest of sites anyway
* and as no one has ever complained I can only assume no one cares.
* It is safe to assume the files were modified if all of the
* following conditions are true:
* - File is in the site
* - File name matches the regex
* - File is HTML
* - There wasn't a problem saving (such as a readonly file)
*/
window.close();
} else {
alert('Sitewide cancelled');
}
}
}
/**
* Recall files regex and site url settings for form
*/
function recallSettings()
{
var cfgFile, filesRegex;
cfgFile = MMNotes.open(dreamweaver.getConfigurationPath()
+ '/Shared/jsn/relativity/relativity.htm', true);
document.forms[0].siteurl.value = MMNotes.get(cfgFile, 'siteurl');
filesRegex = MMNotes.get(cfgFile, 'filesregex');
if (!filesRegex) {
filesRegex = '.+\\.html?$';
}
document.forms[0].filesregex.value = filesRegex;
MMNotes.close(cfgFile);
}



BBunny

unread,
Jun 16, 2011, 2:05:55 PM6/16/11
to BBEdit Talk
Still hoping for a response...
> * details.http://www.gnu.org/licenses/lgpl.txt

Simdude

unread,
Jun 16, 2011, 3:51:33 PM6/16/11
to BBEdit Talk
A shell script might work for you with a little sed. Here's a quick
hack I use. I would test on some sample areas before doing anything
important though. Assume you have a bunch of html files and want to
change the word "old" to "new". You would do this:

sar "old" "new" "*.html"

the quotes are needed. Here's the script:

<---------------- begin cut ------------------------->

#!/bin/ksh
#----------------------------------------------------------------------
# File Name: sar
# Description: This file allows the user to search and replace a
string
# in a bunch of files.
#
# when using wildcards for the filename, place in quotes
#
# example: sar "old" "new" "*.c"
#
#----------------------------------------------------------------------

search=$1
replace=$2
files=$3

for file in "$files"
do
cp $file $file.bak
sed "s/$search/$replace/g" $file > tempsar
mv tempsar $file
done


<---------------- end cut ------------------------->

The script backs the file up first. You would save this to a text
file, "sar" and make it executable (chmod +x sar). This is obviously
EXTREMELY simple with no error checking but I thought I would throw it
out here as a starting point as there were no replies yet.

Mark
> ...
>
> read more »

BBunny

unread,
Jun 17, 2011, 5:11:07 PM6/17/11
to BBEdit Talk
Simdude, thanks for replying. I don't understand how this would work.
I'm not just doing a simple sitewide find/replace. I'm trying to
replace root-relative paths with document-relative paths. That means
the script has to take into account the relative position of the
document being targeted to the document being edited. I don't see how
this script would do that. Am I missing something? (Also, forgive me,
but what does "sed" mean?)
> > > eles =...
>
> read more »

Rick

unread,
Jun 25, 2011, 12:10:42 AM6/25/11
to bbe...@googlegroups.com

BBunny

unread,
Jun 26, 2011, 3:27:19 PM6/26/11
to BBEdit Talk
Thanks, Rick. I wish it were "'nuff said." After investigating how to
run a perl script, I ran it from the terminal. It responded (and I
quote):

Can't locate Getopt/Declare.pm in @INC (@INC contains: /Library/Perl/
Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /
System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-
multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-
thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/
Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /
System/Library/Perl/Extras/5.10.0 .) at /Users/BBunny/Library/
Application Support/BBEdit/Scripts/
change_absolute_to_relative_links.pl line 4.
BEGIN failed--compilation aborted at /Users/BBunny/Library/Application
Support/BBEdit/Scripts/change_absolute_to_relative_links.pl line 4.

Any clues what the problem might be? Also, it supposedly will change
absolute links (http://) to relative ones, but it says nothing about
changing root-relative links to document-relative ones, which is what
I'm trying to do.

On Jun 24, 9:10 pm, Rick <wrink...@gmail.com> wrote:
> http://www.perlmonks.org/?node=Change%20Absolute%20to%20Relative%20li...
>
> 'nuff sed,
> Rick Bychowski

BBunny

unread,
Jun 26, 2011, 3:59:54 PM6/26/11
to BBEdit Talk
What I need is a script that will do the following:

1. Note the location of file 'A' relative to the site root.

2. Find in file 'A' the first root-relative or absolute link to
another file (file 'B') on the site. (Root-relative: /
my_root_directory/path/file; Absolute: http://my_root_directory/path/file).

3. Based on the previously noted location of file 'A' relative to the
site root, and the path in the link, which specifies the path from the
site root to file 'B', calculate the direct path from file 'A' to file
'B'.

4. Change the link to the direct path.

5. Do this for every link in the file.

4. Do this for all files in this directory, recursively.

It seems as if these actions wouldn't be difficult if I knew
scripting...but I don't. Anyone want to give it a try and make me
ecstatic?

On Jun 24, 9:10 pm, Rick <wrink...@gmail.com> wrote:
> http://www.perlmonks.org/?node=Change%20Absolute%20to%20Relative%20li...
>
> 'nuff sed,
> Rick Bychowski

Rick

unread,
Jun 29, 2011, 1:50:55 AM6/29/11
to bbe...@googlegroups.com
Each of the "use" statements means that a module is required. Some may already be installed with your system, but for the others you need to install a cpan client. I recommend cpanminus. That requires a command-line installation, but it is only one command:
curl -L http://cpanmin.us | perl - --sudo App::cpanminus
Then proceed to installing the required perl modules (also a terminal command):

$ cpanm GetOpt::Declare
$ cpanm  HTML::TreeBuilder

etc


Setting up a perl environment maybe an hour, but then you'll have access to literally thousands of useful scripts and modules.


Rick Bychowski

Reply all
Reply to author
Forward
0 new messages