Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Teaser

0 views
Skip to first unread message

Ivan S

unread,
Dec 24, 2009, 5:45:37 AM12/24/09
to
How can I create text teaser from HTML that contains "n" text (not
HTML!) characters?

i.

rf

unread,
Dec 24, 2009, 5:58:03 AM12/24/09
to

"Ivan S" <ivan....@gmail.com> wrote in message
news:a6dc62ca-bc9e-47cd...@26g2000yqo.googlegroups.com...

> How can I create text teaser

What is a "teaser"?

> from HTML

You don't create things "from HTML".

> that contains "n" text

What is "n" text?

> (not
> HTML!) characters?

What are "HTML characters"?

And what does this have to do with PHP?


C. (http://symcbean.blogspot.com/)

unread,
Dec 24, 2009, 7:32:06 AM12/24/09
to

substr(strip_tags($src_html),0,200);

C.

Ivan S

unread,
Dec 24, 2009, 8:07:28 AM12/24/09
to
On Dec 24, 11:58 am, "rf" <r...@z.invalid> wrote:
> What is a "teaser"?

Hm, a short preview.

> You don't create things "from HTML".

Ok, from a string, which happens to contain HTML code. :)

> What is "n" text?

n, like non specific number.

> What are "HTML characters"?

HTML tag characters.

For example:

<p>Testing</p>

n = 3

I want to get:

<p>Tes</p>

and not

Testing</p>

> And what does this have to do with PHP?

It's a general problem and solution can be achived in various
languages, but I prefer PHP.
Where do you think I should ask? HTML group? I think this is server
side issue.

Ivan S

unread,
Dec 24, 2009, 8:19:18 AM12/24/09
to
On Dec 24, 1:32 pm, "C. (http://symcbean.blogspot.com/)"
<colin.mckin...@gmail.com> wrote:
> substr(strip_tags($src_html),0,200);

I tried this solution of course, but the problem with this solution is
that all HTML is gone with strip_tags function call.

I my particular case, there is no problem for most of HTML tags, but
for example, anchors (a tags) are stripped also.
Ok, I can skip anchor tags in strip_tags function call, but than the
problem is same as before - how to strip only text characters? :)

Message has been deleted

rf

unread,
Dec 24, 2009, 8:42:54 AM12/24/09
to

I think it's time to rephrase your question.


Ivan S

unread,
Dec 24, 2009, 9:23:12 AM12/24/09
to
On Dec 24, 2:20 pm, houghi <hou...@houghi.org.invalid> wrote:
> That is much more clear. Where is the data coming from? Does it contain
> something like:
> <p><h1>Test</h1></p>
> n=3
> And should the outcome then be:
> <p><h1>Tes</h1></p>
>
> Or should it be <p>Tes</p>
>
> What if you have <p>Test 1</p><p>Test 2</p> and n=8. Should it become
> <p>Test 1</p> or <p>Test 1</p><p>Te</p> or even something else.
>
> Just imagine that nobody here knows what you want or what you already
> have and formulate the answer with thta knowledge.

I'm sorry if I was unclear.


So ... data is comming from wysiwyg editor. It's well formed HTML (or
XHTML, as I define).
I have to get substring from text that user enters (only visible
characters). So, for example, user enters:

* Test

And that's no problem. :)

Problem is when he adds some styles or anchor for example ...

* Test <a href="www.example.com">click me</a> now. And now comes
loooooong text.

If I want to get, for example, first 17 characters, the result would
be:

* Test <a href="www.example.com">click me</a> now

So, only characters within HTML tags are counted.

In your example:

<p>Test 1</p><p>Test 2</p> and n=8

<p> - not counting, because this is HTML tag
Test 1 - counting, 5 characters
</p><p> - no counting
Test 2 - counting, more than 8 - substring

After substring:
<p>Test 1</p><p>Tes


As you can see, there is no closing p tag. But that's no problem,
there are scripts that can tidy up broken HTML code.

I hope this clears things. :)

Ivan S

unread,
Dec 24, 2009, 9:25:55 AM12/24/09
to
On Dec 24, 2:42 pm, "rf" <r...@z.invalid> wrote:
> I think it's time to rephrase your question.

And how should I ask?

matt

unread,
Dec 24, 2009, 9:36:06 AM12/24/09
to

I'm not sure what you mean by stripping only HTML tag text
characters? To me that means this:

<a href="http://www.google.com">Google</a>

becomes

<="://..">Google</>

Which, I'm pretty sure is NOT what you're after. You may need to
rephrase that one for us as well :)

I take it you want to preserve some tags and strip others? There's a
couple of issues with this...

1. Say you want to allow <span> tags and have a 200 character limit on
the text. The span tag alone with a style attribute could easily take
up that many characters, so you need a method of only counting the
displayed characters.

2. Whenever you allow tags and then truncate the output, you have to
take special care to close any open tags. For example:

<i>205 characters of text</i>

Even though you're going to chop off the last 5 characters of the
text, you need to close the italics tag.

3. HTML special chars can add up fast:

&qout;War &amp; Peace &mdash; A really long book&quot;

Is that 54 characters, or 34?

Message has been deleted
Message has been deleted

Ivan S

unread,
Dec 24, 2009, 10:09:22 AM12/24/09
to
On Dec 24, 3:36 pm, matt <matthew.leonha...@gmail.com> wrote:
> I'm not sure what you mean by stripping only HTML tag text
> characters?  To me that means this:
>
> <a href="http://www.google.com">Google</a>
>
> becomes
>
> <="://..">Google</>
>
> Which, I'm pretty sure is NOT what you're after.  You may need to
> rephrase that one for us as well :)

Sorry again. :)

<a href="http://www.google.com">Google</a>

becomes

<a href="http://www.google.com">Goo

So, that would be inner text charaters (hopefully :) ).

> I take it you want to preserve some tags and strip others?  

It would be nice to preserve all tags.

> 1. Say you want to allow <span> tags and have a 200 character limit on
> the text.  The span tag alone with a style attribute could easily take
> up that many characters, so you need a method of only counting the
> displayed characters.

Yes! That's the point!! :)

> 2. Whenever you allow tags and then truncate the output, you have to
> take special care to close any open tags.  For example:
>
>    <i>205 characters of text</i>
>
> Even though you're going to chop off the last 5 characters of the
> text, you need to close the italics tag.

I'm aware of that.
That not a problem, there are scripts that can sanitaze broken HTML
code.

> 3. HTML special chars can add up fast:
>
>    &qout;War &amp; Peace &mdash; A really long book&quot;
>
> Is that 54 characters, or 34?

Should be 34. But that's not a real problem also, special chars can be
easly transformed in regular characters and counted properly.

Luuk

unread,
Dec 24, 2009, 1:52:28 PM12/24/09
to
Op 24-12-2009 16:09, Ivan S schreef:

after reading this, the best solution was give by C., who said:
substr(strip_tags($src_html),0,200);

But you have to create some script to apply the HTML-tages again (and
close them correctly!)

--
Luuk

mlemos

unread,
Dec 25, 2009, 9:58:21 PM12/25/09
to Ivan S
Hello,

on 12/24/2009 08:45 AM Ivan S said the following:


> How can I create text teaser from HTML that contains "n" text (not
> HTML!) characters?

This PHP class does eactly what you ask:

http://www.phpclasses.org/cut-html-string

--

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Ivan S

unread,
Dec 26, 2009, 5:41:40 AM12/26/09
to
On Dec 26, 3:58 am, mlemos <mle...@acm.org> wrote:
> Hello,
>
> on 12/24/2009 08:45 AM Ivan S said the following:
>
> > How can I create text teaser from HTML that contains "n" text (not
> > HTML!) characters?
>
> This PHP class does eactly what you ask:
>
> http://www.phpclasses.org/cut-html-string

Thank you Manuel, that's what I need. :)


i.

0 new messages