Fix duplicate divs caused by old Adobe GoLive bug?

32 views
Skip to first unread message

James Lee

unread,
Feb 20, 2017, 12:32:16 PM2/20/17
to BBEdit Talk
I still have some code created buy Adobe GoLive editing. There seems to have been a flaw that caused duplicate vids and extra spaces so the pages have become almost impossible to edit.  Here is an example:
-----
<div align="left">
<p><font size="4" face="Verdana,arial,helvetica,sans-serif"><font size="3"><strong>"One of the greatest discoveries a man makes, one of his great surprises, is to find he can do what he was afraid he couldn't do." - Henry Ford</strong></font></font></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
-----
This index file is not very wide (feet, not inches).  I am hoping BBEdit has some tricks I have yet to find to fix this problem. Ir maybe there is a way to go back to Adobe GoLive and fix them.  Any ideas would be appreciated.

jgill

unread,
Feb 21, 2017, 4:37:09 AM2/21/17
to BBEdit Talk
That is a problem with all WYSIWYG editors. If you don't get it right first time, you get silly situations like those nested <font> tags and nested divs.

There are two ways to tackle this. Manually: using BBEdit's syntax checker to find errors in the markup and fixing them individually. Automatically: using a grep search and replace. Unfortunately, the nested font tags will not be seen as an error, they are just clumsy and inefficient. There is no way to tell from your example if all those ending divs also have opening divs, making them valid technically, but again inefficient.

Bill Kochman

unread,
Feb 21, 2017, 6:26:53 AM2/21/17
to bbe...@googlegroups.com
One way to avoid a lot of the <body> clutter is by converting your website over to the AMP — Accelerate Mobile Pages — standard, which is what I recently did with my primary website.

It was a lot of work — and very challenging and frustrating at times due to my own lack of knowledge — but I believe I made a wise choice.

The thing about Google’s AMP specifications is that while most of it is standard HTML tags that we are familiar with — with a few new ones, or replacement tags thrown in the mix — it is also VERY strict, and a lot of standard HTML elements are firmly disallowed. For example:

java scripts and java applets
background images
gradients
external stylesheets
inline font styles
table summaries
image borders
etc.

So how does this relate to your case?

Well, for one thing, you cannot even use the <font> tag in the body of your AMP-compliant HTML document. All styling MUST be done in the head section of each HTML. The end result is that you have a MUCH cleaner and LESS complicated body section, where you will primarily use classes instead.

Not only will your pages be cleaner by converting to AMP, but they will load faster as well, which offers its own array of benefits for a webmaster. For example, with Google’s PageSpeed Insights tool, all of my pages average 87 to 95 on a scale of 100, depending on whether I am looking at the mobile page score, or the desktop page score.

Regarding that long string of closing </div> tags you are experiencing, my personal approach is to close a <div> tag just as soon as it is no longer needed, thus avoiding as much as possible, what you are now experiencing.

I don’t know if any of the above will help you or not, but there it is for what it is worth.

If you are interested in AMP,  you can get a start here:


Oh, I also highly recommend that you look into Jim Derry’s “Balthisar Tidy for Work” in the App Store. It is only $8, and is a very wise investment, in my view.

Kind regards,

Bill K.


On Monday, February 20, 2017 at 5:32:16 PM UTC, James Lee wrote:
I still have some code created buy Adobe GoLive editing. There seems to have been a flaw that caused duplicate vids and extra spaces so the pages have become almost impossible to edit.  Here is an example:
-----
<div align="left">
<p><font size="4" face="Verdana,arial,helvetica,sans-serif"><font size="3"><strong>"One of the greatest discoveries a man makes, one of his great surprises, is to find he can do what he was afraid he couldn't do." - Henry Ford</strong></font></font></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
-----
This index file is not very wide (feet, not inches).  I am hoping BBEdit has some tricks I have yet to find to fix this problem. Ir maybe there is a way to go back to Adobe GoLive and fix them.  Any ideas would be appreciated.

--
This is the BBEdit Talk public discussion group. If you have a
feature request or would like to report a problem, please email
"sup...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To post to this group, send email to bbe...@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.

Greg Raven

unread,
Feb 21, 2017, 9:57:18 AM2/21/17
to BBEdit Talk
If the only issue standing between you and a simple search-and-replace is the varying spaces from page to page, create a TextFactory that Optimizes your entire site (HTML pages, that is), and then perform the search-and-replace. Afterward, you can use a TextFactory to Format each of your HTML pages.

Greg Raven

unread,
Feb 21, 2017, 10:00:41 AM2/21/17
to BBEdit Talk
AMP seems better suited to programmatically-generated web pages, not manually maintained static pages. The main problem I have with AMP for static pages is that there is no way to validate AMP code ... each page will generate a bunch of errors due to AMP's non-standard HTML. Some of my static sites have tens of thousands of pages, and there's no way I'm 1) purposely creating erroneous code on each page, or 2) wading through a super-massive syntax error report hoping to winnow out the few non-AMP errors. YMMV.

Fletcher Sandbeck

unread,
Feb 21, 2017, 10:40:49 AM2/21/17
to bbe...@googlegroups.com
I've worked on converting some large sites to AMP and it has been good for the HTML quality on the sites. My sites are generally dynamic, but contain user submitted articles and comments so it's been a two step process. Ensuring that the template and navigation of the site is AMP-friendly. And then working through errors which are flagged in user content. It's been a lot of work, but it has also found a lot of errors which have been on the sites for years in some cases since they would sometimes pass less strict HTML validation.

I recommend the Chrome Extension which can be found in the menu here: https://validator.ampproject.org/  For AMP pages it performs an automatic syntax check. It also makes it easy to jump to the AMP equivalent of a non-AMP page.

[fletcher]

Fletcher Sandbeck

unread,
Feb 21, 2017, 10:58:46 AM2/21/17
to bbe...@googlegroups.com
The trick is to figure out what the problems are and then apply those fixes to all your pages using BBEdit's multi-file search/replace. However, finding the errors is easier than fixing them. 

This pattern "<font.*?>\s*<font" will find a nested font tag. However, to fix it you need to merge the parameters. In your example you'd want to end up with <font size="3" face="Verdana,...">. If this same nested pair occurs a lot then you can search/replace it. There probably are only so many combinations throughout your site.

The pattern "<([a-zA-Z0-9]+ .*)>\s*<\1>" will find an instance of one tag followed immediately by the same tag with the same parameters. The internal tag is unnecessary, but automatically deleting the tag without affecting the nesting is difficult. You can also do this <div> specific by hardcoding the first part of the pattern "<(div .*)>\s*<\1>"

As someone else mentioned what you might consider is modernizing the code generally. If you search for that face="Verdana..." string you'll probably find that it has been inserted many times in your code. Instead you can create a class in your CSS and work through the site applying that style to a new <span> or even to the surrounding <div> and cleaning out the old code. That will make a more maintainable site moving forward. 

[fletcher]


Bill Kochman

unread,
Feb 21, 2017, 12:14:27 PM2/21/17
to bbe...@googlegroups.com
On Feb 22, 2017, at 1:00 AM, Greg Raven <greg...@gmail.com> wrote:

AMP seems better suited to programmatically-generated web pages, not manually maintained static pages. The main problem I have with AMP for static pages is that there is no way to validate AMP code ... each page will generate a bunch of errors due to AMP's non-standard HTML. Some of my static sites have tens of thousands of pages, and there's no way I'm 1) purposely creating erroneous code on each page, or 2) wading through a super-massive syntax error report hoping to winnow out the few non-AMP errors. YMMV.

Hello Greg. I am not exactly sure what you mean by the above, but all of the pages on my website are static pages which I do in fact manually maintain via some AppleScripts  that Jim Derry wrote, which allow me to run Balthisar Tidy for Work from within BBEdit. It is a very nice combination of text and HTML editing tools.

Not only that, but I have been in close communication with Jim, and he has been making certain improvements to his app, so that it can handle AMP’s peculiarities, such as its proprietary tags, and AMP’s insistence on keeping all of Google’s scripting codes on a single line. Otherwise, the AMP validator will not pass an AMP HTML document. You will see errors like “text [CDATA]” which Balthisar Tidy for Work can now correct by putting the scripts all on one line, including the closing “</script>” tag.

For the record, in case you were not aware of it, you can add AMP’s non-standard tags to Balthisar Tidy for Work, so that it overlooks them.

In short, I tidy ALL of my AMP documents from within BBEdit itself, simply by choosing the Balthisar Tidy for Work scripts under BBEdit’s scripts menu. It works great.

There are actually several ways to easily validate AMP code.

The first is the AMP Project’s own online validation tool, which you can find here:


I have used it many, many times to validate my docs, and now all of my HTML pass with flying colors.

Google’s Chrome browser also has the AMP validator built into it.

Likewise, there is a plugin for Firefox as well — I am not sure if it is still actually being developed or not, but i have it installed -- which allows you to validate via its browser console. However, I prefer the AMP Project’s online validator. It is very quick. Just give it a URL, and seconds later, it tells if your doc passed, or else what is wrong with it.

I hope the above helps. I too was rather frustrated by the AMP validation errors, until I finally understood why they were occurring.

Kind regards,

Bill K.

Bill Kochman

unread,
Feb 21, 2017, 12:21:01 PM2/21/17
to bbe...@googlegroups.com

On Feb 22, 2017, at 1:40 AM, Fletcher Sandbeck <flet...@cumuli.com> wrote:

I've worked on converting some large sites to AMP and it has been good for the HTML quality on the sites. My sites are generally dynamic, but contain user submitted articles and comments so it's been a two step process. Ensuring that the template and navigation of the site is AMP-friendly. And then working through errors which are flagged in user content. It's been a lot of work, but it has also found a lot of errors which have been on the sites for years in some cases since they would sometimes pass less strict HTML validation.

I recommend the Chrome Extension which can be found in the menu here: https://validator.ampproject.org/  For AMP pages it performs an automatic syntax check. It also makes it easy to jump to the AMP equivalent of a non-AMP page.

[fletcher]

Hello Fletcher. I agree. My website is rather small — just over 6,000 HTML documents — but after my initial struggles with learning and understanding AMP, I am very pleased with the results. My site is a lot simpler now, without all of the overhead clutter that I used to implement, but it is a lot cleaner, crisp, sharp and refined, thus making it snappier and speedier as well, which is exactly what Google wants for mobile devices.

So between implementing AMP, SSL/TSL and HSTS Upload List over the past few weeks, my site has seen some major improvements. It required adopting some new HTML coding habits on my part, but it has been worth it, at least for me.

Kind regards,

Bill K.


Bill Kochman

unread,
Feb 21, 2017, 12:42:40 PM2/21/17
to bbe...@googlegroups.com
On Feb 22, 2017, at 1:58 AM, Fletcher Sandbeck <flet...@cumuli.com> wrote:

> The trick is to figure out what the problems are and then apply those fixes to all your pages using BBEdit's multi-file search/replace. However, finding the errors is easier than fixing them.

Exactly. I have massively used BBEdit’s multi-file search and replace option countless times over the past few weeks as I have updated to the AMP and SSL/TSL standards.

I have also learned a few important lessons along the way as well.

One thing I would tell anyone who has a mature, aged website, and who is thinking of converting to the AMP standard and specifications, is to follow what I personally refer to as consistency, continuity and standardization.

In my case, my website is twenty years old. Over the years, as my HTML coding skills have improved, I have changed styles and techniques many times. As a result, until I recently fixed everything -- by standardizing my code across the entire site using BBEdit’s multi-file find and replace — my HTML code was very uneven.

Because of the uneven code, there have been a number of times where performing a certain global find and replace would correct something, but break something else, thus doubling my own work.

For example, having absolute URLs in some places, but relative URLs in other places, can wreak havoc on your site, if you are not careful when you perform global find and replaces in BBEdit. After conducting a lot of online research, and seeing what a divided camp exists, I finally just decided to use absolute URLs everywhere. That way, I know for sure where things are pointing to, and Googlebot will know as well.

What I soon realized is that it is a lot wiser to work on individual directories — as opposed to the entire website at once — and make very specific, targeted changes incrementally.

In short, before you even consider converting your site to meet AMP standards, take the time to standardize your HTML code, particularly if it is a very old site with a lot of different coding styles and approaches. It will be a lot easier to convert to AMP specifications if you do that first.

This is precisely where using BBEdit and Balthisar Tidy for Work together has saved me so much time and effort. They are both invaluable tools for the webmaster.

> As someone else mentioned what you might consider is modernizing the code generally. If you search for that face="Verdana..." string you'll probably find that it has been inserted many times in your code. Instead you can create a class in your CSS and work through the site applying that style to a new <span> or even to the surrounding <div> and cleaning out the old code. That will make a more maintainable site moving forward.
>
> [fletcher]

I agree, Fletcher. Modernizing and standardizing all of your HTML code is definitely the way to go. Doing so will indeed save you a lot of headaches and frustration in the future.

Kind regards,

Bill K.



Greg Raven

unread,
Feb 21, 2017, 1:08:23 PM2/21/17
to BBEdit Talk
What I mean is that if your pages are generated by WordPress (for example), then it's marginally easier to insert all the special AMP coding. When the pages are generated programmatically, then each page should be pretty much the same for any given template, so you can sample-validate a few pages, and know that the rest of the pages should validate as well. There is no way I am going to go through the external validation process required for manually-constructed AMP pages ... especially when one click within BBEdit checks my entire site.

Glad you're having a good experience with AMP. I've taken a couple runs at it without success. I serve AMP versions on my WordPress sites, but even though the Google plug-in generates non-valid AMP code, and the AMP pages themselves don't have the navigation and other page elements I like to have.

Bill Kochman

unread,
Feb 21, 2017, 7:36:10 PM2/21/17
to bbe...@googlegroups.com
Greg, I am glad you brought up that point regarding WordPresss, because this is something I have been wondering about since last night.

As you already know, AMP specifications outright prohibit certain standard HTML tags -- such as the font tag — and inline styling in HTML document bodies. Thus, you have to rely upon div classes instead, which are permitted.

My problem is this:

In my WP blog posts — it is a self-installed blog, not on wordpress.com -- I do a very high amount of copying and pasting from hundreds of documents which I create and maintain from within BBEdit. In said BBEdit documents, I employ the font color tag thousands of times to colorize verse references.

Now, on my actual website, it is very easy to replace those AMP-prohibited font tags with div classes instead — which I have done by the thousands via BBEdit’s multi-file find and replace option — because the actual CSS code which regulates those div classes is contained in the head of the very same HTML document, as per Google’s and AMP’s requirements.

But how do I transfer that over to my actual WP blog posts?

In other words, if I surround a verse reference with a div styling tag like this . . .

<div class=“blue font”>John 3:16</div>

. . . it is going to be lost in translation, because there are no CSS instructions telling WP what that div means.

Worse yet, I syndicate to eight social networks. Some of them respect and properly parse standard HTML tags, such as the font color tag, for example. But I doubt that they know what to do with AMP tags and divs.

Facebook doesn’t even respect standard HTML tags. Even though it is the social network where I most heavily participate, it is the worst culprit when it comes to parsing my WP posts. It strips out font colors, and even line and paragraph formatting.

If there is some way to import the exact same CSS head information into my WP blog posts, that is contained in my BBEdit HTML documents — that is, tell WP, “Here. Use this CSS style sheet for all of my posts.” -- that would be great. At least then I could use those same div classes -- which the AMP specifications accept as valid — instead of font tags in my WP blog posts, which will not pass the AMP validator’s muster.

While there are some WP plugins to make WP posts HTTPS/SSL compliant, I wonder if Automattic and gang have done anything to automatically convert standard HTML tags over to AMP-compliant tags and divs before post syndication occurs.

Anyway, if you are aware of any easy solutions, please let me know. For now, I continue to use font color tags in my WP posts, which WP automatically converts to span styles instead. But I bet even those fail AMP standards.

Kind regards,

Bill K.

Greg Raven

unread,
Feb 21, 2017, 7:40:47 PM2/21/17
to Bare Bones
Bill,

Rather than continue to hijack the topic, I’m responding privately.

The easy way is to install JetPack (which you should be using anyway), and then avail yourself of the “custom CSS” capability. Then you can define and use custom classes, and apply these via divs or spans.

I don’t know, however, if AMP picks up the custom CSS — come to think of it, it might not because it replaces a lot of things with its own processes.

In that case, you’d have to rely on the “style” attribute to your divs and/or spans.

Yeah, it’s a hassle.

One more reason why I rely so heavily on Bootstrap.

Hope this helps.

Greg Raven
20258 US Hwy 18 Ste 430-513
Apple Valley, CA 92307-6197

--
This is the BBEdit Talk public discussion group. If you have a
feature request or would like to report a problem, please email
"sup...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
---
You received this message because you are subscribed to a topic in the Google Groups "BBEdit Talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bbedit/byMII2po-js/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bbedit+un...@googlegroups.com.

Bill Kochman

unread,
Feb 21, 2017, 8:20:33 PM2/21/17
to bbe...@googlegroups.com
Hello Greg,

I too was a little concerned that Rich and Patrick might think I was getting a little too far off-topic. :)

At any rate, it seems that I may have been worrying about nothing.

You see, a while back I installed Automatic’s “AMP” plugin, but then I forgot about it.

I just researched it again at https://wordpress.org/plugins/amp/, and it seems that it automatically makes the necessary conversions for AMP compliance, without me doing anything special, and without me even implementing any custom CSS via Jetpack, which I do have installed, by the way.

I just looked at the AMP-generated version of a few of my blog posts by appending “/amp/ to the URL, and then I looked at the page source in Firefox, in order to see how the AMP plugin was handling my colorized verse references. It seems that the plugin creates its own tags, like this:

<span class="amp-wp-inline-ad7cff81ab4cfac2fda8135582b2c73c">Psalms 127:3, KJV</span>

<span class="amp-wp-inline-ad7cff81ab4cfac2fda8135582b2c73c">1 Corinthians 6:19-20, KJV</span>

So I am not even going to worry about it. After all, my entire website is now AMP-compliant and locked up with SSL/TSL.

Of course, as I said before, Facebook is an entirely different issue. Those folks don’t even respect standard HTML code and strip it out of syndicated blog posts.

Kind regards,

Bill K.
Reply all
Reply to author
Forward
0 new messages