search n replace for weird characters

Disco Octopus

unread,

Oct 13, 2005, 11:28:45 PM10/13/05

to

Hi,

In one of our web server pages, we have a textarea entry to a database
field.

In this particular field, some of our users make their data changes in
MS Word, then they paste the contents into this textarea field.

Along with the text copied over, comes some MS weird characters such as
those where the bullets points, double quotes, and long-dashes are,
etc.

I would like to know ideas on how people have gone about
search/replacing these characters for 'normal' characters.

Thanks

--
beef jerky - good with mates : http://www.choicebeefjerky.com.au
shopping is NOT a sport

Disco Octopus

unread,

Oct 13, 2005, 11:28:45 PM10/13/05

to uniface...@lists.umanitoba.ca

Ulrich Merkel

unread,

Oct 14, 2005, 3:28:09 AM10/14/05

to UnifaceUserGroupDiscussionForum

Hi Disco,

just write a filter procedure which takes the string character by character.
Just use a string of ALLOWED characters and drop all others

entry FILTER_ALLOWED
params
string v_instring : in
string v_outstring : in
endparams
variables
string v_char
endvariabmes
v_outstring = ""
while (v_instring != "")
v_char = v_instring[1:1]
v_instring = v_instring[2]
SCAN "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,- %%^",v_char
if ($result > 0)
v_outstring = "%%v_outstring%%%%%v_char%%%"
endif
endwhile
return(0)
end ; FILTER_ALLOWED

Success, Uli

Uniface User Group Discussion Forum <unif...@uug.org> schrieb am 14.10.05 05:49:55:

> _______________________________________________
> Uniface User Group Discussion Forum
> For more information: http://lists.umanitoba.ca/mailman/listinfo/uniface-l
> To unsubscribe/set options: http://lists.umanitoba.ca/mailman/options/uniface-l

-- Ulrich Merkel --

Ulrich Merkel.vcf

Ingo Stiller

unread,

Oct 14, 2005, 10:26:53 AM10/14/05

to

Hi Uli

A little bit time and memory consuming :-)
Why?
'Cause every time you truncate or concat a string, UnifAce uses extra
space for handling the data. [UnifAce is not a 3GL :-) ]
So move as little as possible

Here is another version:

entry FILTER_ALLOWED
params
string v_instring : in
string v_outstring : in
endparams
variables

string v_char,v_CHAR2
numeric v_POS,v_LEN
numeric v_POS1,v_POS2
endvariables
$STRING_ALLOWED$="abc..."
v_outstring = v_INSTRING
length v_OUTSTRING
v_LEN=$result
v_POS=0
while (v_POS<v_LEN)
v_POS=v_POS+1
v_char = v_instring[v_POS:1]
SCAN $STRING_ALLOWED$,v_CHAR
if ($result <= 0)
SELECTCASE v_CHAR
CASE "Ü"
v_CHAR2="Ue"
...
; Use CTRL-J to enter composed characters
ELSECASE
v_char2="?" ; any valid character or empty
ENDSELECTCASE
v_POS1=v_POS-1
v_POS2=v_POS+1
v_outstring =
"%%v_outstring[1:v_POS1]%%v_CHAR2%%v_OUTSTRING[v_POS2]"

endif
endwhile
return(0)
end ; FILTER_ALLOWED

btw Dico : Set the (non-database) 'textarea' to "special string" with
"full character set". Elsewhere you got an error when leaving the field
if there are "special charcters"

Ingo Stiller

unread,

Oct 14, 2005, 12:16:34 PM10/14/05

to

Shit,there was an error in the code
Here is the better one

length v_CHAR2
IF($result!=1)
v_POS=v_POS-1+$result
v_LEN=v_LEN-1+$result
ENDIF

Geuzebroek, Kristiaan

unread,

Oct 14, 2005, 9:54:02 AM10/14/05

to Uniface User Group Discussion Forum

Disco,

If you know exactly wich characters you want to filter you can use the Uniface proc function $replace

Greetings Kris Geusebroek

-----Oorspronkelijk bericht-----
Van: uniface-...@uug.org [mailto:uniface-...@uug.org] Namens Ulrich Merkel
Verzonden: vrijdag 14 oktober 2005 9:28
Aan: UnifaceUserGroupDiscussionForum
Onderwerp: Re: [Uniface-L] search n replace for weird characters

Hi Disco,

just write a filter procedure which takes the string character by character.
Just use a string of ALLOWED characters and drop all others

entry FILTER_ALLOWED

params
string v_instring : in
string v_outstring : in
endparams
variables

string v_char
endvariabmes
v_outstring = ""
while (v_instring != "")
v_char = v_instring[1:1]
v_instring = v_instring[2]
SCAN "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,- %%^",v_char
if ($result > 0)
v_outstring = "%%v_outstring%%%%%v_char%%%"

endif
endwhile
return(0)
end ; FILTER_ALLOWED

Success, Uli

Uniface User Group Discussion Forum <unif...@uug.org> schrieb am 14.10.05 05:49:55:
>
> Hi,
>
> In one of our web server pages, we have a textarea entry to a database
> field.
>
> In this particular field, some of our users make their data changes in
> MS Word, then they paste the contents into this textarea field.
>
> Along with the text copied over, comes some MS weird characters such as > those where the bullets points, double quotes, and long-dashes are,
> etc.
>
> I would like to know ideas on how people have gone about
> search/replacing these characters for 'normal' characters.
>
> Thanks
>
> --
> beef jerky - good with mates : http://www.choicebeefjerky.com.au
> shopping is NOT a sport
>
>
> _______________________________________________
> Uniface User Group Discussion Forum
> For more information: http://lists.umanitoba.ca/mailman/listinfo/uniface-l
> To unsubscribe/set options: http://lists.umanitoba.ca/mailman/options/uniface-l

-- Ulrich Merkel --

**********************************************************************
Disclaimer
De informatie in dit bericht kan vertrouwelijk zijn.
Zij is uitsluitend bestemd voor de geadresseerde.
Indien u dit bericht onterecht ontvangt, wordt u verzocht
de inhoud niet te gebruiken en de afzender direct
te informeren. Wilt u het bericht dan direct retourneren.
Quion is niet aansprakelijk voor de overdracht van de
inhoud van dit e-mail bericht. Evenmin is Quion
aansprakelijk voor eventuele vertragingen.
The information contained in this message may be
confidential and is intended to be exclusively for the
addressee only. Should you receive this message
unintentionally, please do not use the contents herein and
notify the sender immediately by return of e-mail.
Quion is not liable for the transmission hereof or for any delays.

**********************************************************************

Ingo Stiller

unread,

Oct 16, 2005, 6:51:29 AM10/16/05

to

Hi Kristiaan

Do we know if Disco is using UnifAce 8 :-)

Ingo

Geuzebroek, Kristiaan schrieb:

Disco Octopus

unread,

Oct 17, 2005, 2:12:30 AM10/17/05

to

Hi,

Disco uses Uniface 8.4.03.01

I am curiouse as to what other people have used becasue I would like
believe that there is a deffinative list of MS Word characters
converting from other characters... such as...

MS Word ---> ASCII
<three dots char>... ---> ...
<copyright>(c) ---> (c)
<double quote open>" ---> "
<double quote close>" ---> "
etc.
etc.

Does anyone have a list like this?

Thanks

> Hi Kristiaan
>
> Do we know if Disco is using UnifAce 8 :-)
>
> Ingo
>
> Geuzebroek, Kristiaan schrieb:
>

>> Disco,
>>
>> If you know exactly wich characters you want to filter you can use the
>> Uniface proc function $replace
>>
>> Greetings Kris Geusebroek

--
a beef jerky web site : http://www.choicebeefjerky.com.au
not a beef jerky web site : http://mycoolfish.com/vote.cmks
dont pick your nose if it is sore

Ingo Stiller

unread,

Oct 17, 2005, 3:03:19 AM10/17/05

to

Hi Disco

Look at Word itself, keyword is "auto correction" :-)

Ingo

Disco Octopus

unread,

Oct 17, 2005, 2:12:30 AM10/17/05

to uniface...@lists.umanitoba.ca

Hi,

Disco uses Uniface 8.4.03.01

I am curiouse as to what other people have used becasue I would like
believe that there is a deffinative list of MS Word characters
converting from other characters... such as...

MS Word ---> ASCII
<three dots char>... ---> ...
<copyright>(c) ---> (c)
<double quote open>" ---> "
<double quote close>" ---> "
etc.
etc.

Does anyone have a list like this?

Thanks

> Hi Kristiaan

>
> Do we know if Disco is using UnifAce 8 :-)
>
> Ingo
>
> Geuzebroek, Kristiaan schrieb:
>

>> Disco,
>>
>> If you know exactly wich characters you want to filter you can use the
>> Uniface proc function $replace
>>
>> Greetings Kris Geusebroek

--