i.e.
?- split_string("this is a test",Out).
Out=["this","is","a","test"]
I'm using swipl, and I've been tracing it, and see that it does what I
want on the way into the functions, but on the way out, it throws away the
strings I was trying to collect. Can anyone school me on this?
split_string(String,Collected_strings):-
string_to_list(String,Charlist),
char_code(' ',Space),
collect_strings(Charlist,Space,Collected_strings).
collect_strings([],_,[]):-!.
collect_strings(Charlist,Last,[String|Collected_strings]):-
collect_chars(Charlist,Nextlist,Last,Collected_chars),
string_to_list(String,Collected_chars),
collect_strings(Nextlist,Last,Collected_strings).
collect_chars([32|Charlist],Charlist,Last,[]):-
Last\==32,!.
collect_chars([32|Charlist],_,32,Collected_chars):-
collect_chars(Charlist,_,32,Collected_chars),!.
collect_chars([Code|Charlist],Nextlist,_,[Code|Collected_chars]):-
Code\==32,
Last1=Code,
collect_chars(Charlist,Nextlist,Last1,Collected_chars),!.
--
Dustin Kick
http://homepage.mac.com/mac_vieuxnez
Your specification is nice and clear, but I can't follow your code.
If I wrote this it would include a line something like
append( FirstString, [32|Rest], ListOfChars )
Nick
--
Nick Wedd ni...@maproom.co.uk
I think I saw that solution, it would yield:
?-split_string("test this",Out).
Out=["test"," "," "," ","this"]
Well, the collect_chars does that part well (I forgot to mention I'm
munching spaces, so I don't get strings returned with a couple of spaces
at the end (or front)), and as I trace it, it show the strings getting
collected in Collected_strings, while moving deeper into the call, but it
throws them away again on the way out.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%
% startsWith/3
% startsWith(OldString,Pattern,Rest)
%
% Examples: startsWith(" aaa"," ",Rest)
% Rest = "aaa"
startsWith(OldString,[],OldString):-
!.
startsWith([H|TOldString],[H|T],Rest):-
!,
startsWith(TOldString,T,Rest).
?-split("this is a test"," ", Out).
Out=["this","is","a"," test"]
I really think the code I'm working on is on the right track for what I
want to do, but I can't figure out why it throws the strings out of the
lists on the way up from the depths I've been tracing it, and at the
bottom, after having parsed all the characters in a list, the
Collected_strings at the top will be ["this","is","a",_GXXX|_GYYY], and "test" will
be stored in String, then on the way out of each call it throws all the
strings in the list away one by one. hmmm... I guess I know where to
look, but can't seem to find it. (probably a head smacker ("Can't
believe I didn't get that!" sort of thing).
--
Dustin Kick
http://homepage.mac.com/mac_vieuxnez
collect_strings([],_,[]):-!.
collect_strings(Charlist,Last,[String|Collected_strings]):-
collect_chars(Charlist,Nextlist,Last,Collected_chars),
string_to_list(String,Collected_chars),
collect_strings(Nextlist,Last,Collected_strings).
collect_chars([],_,_,[]):-!.
collect_chars([32|Charlist],Charlist,Last,[]):-
Last\==32,!.
collect_chars([32|Charlist],_,32,Collected_chars):-
collect_chars(Charlist,_,32,Collected_chars),!.
collect_chars([Code|Charlist],Nextlist,_,[Code|Collected_chars]):-
Code\==32,
Last1=Code,
collect_chars(Charlist,Nextlist,Last1,Collected_chars),!.
collect_strings([],_,[]):-!.
collect_strings(Charlist,Last,[String|Collected_strings]):-
collect_chars(Charlist,Nextlist,Last,Collected_chars),
string_to_list(String,Collected_chars),
collect_strings(Nextlist,Last,Collected_strings).
collect_chars([],_,_,[]):-!.
collect_chars([32|Charlist],Charlist,Last,[]):-
Last\==32,!.
collect_chars([32|Charlist],Nextlist,32,Collected_chars):- % <----I
wasn't passing the Nextlist through before. collect_chars(Charlist,Nextlist,32,Collected_chars),!.
collect_chars([Code|Charlist],Nextlist,_,[Code|Collected_chars]):-
Code\==32,
Last1=Code,
collect_chars(Charlist,Nextlist,Last1,Collected_chars),!.
Usually, input analysis in Prolog is better done using DCG (Definite Clause
Grammars), a standard extension that enable very expressive parsing
handling. It's also efficient. Using SWI-Prolog, search DCG in the help:
then you'll find a concise (but complete) example.
Using that, i propose:
% driver: should be the only public member of your module split_string
%
split_string(S, L) :- phrase(split_str(L), S).
% scan a list of words separed by spaces
%
split_str([H|T]) --> blanks, inwords(H), blanks, split_str(T).
split_str([]) --> [].
% a word is a sequence of (at least one!) not blanks
%
inwords([C|Cs]) --> [C], { ok(C) }, inwords(Cs).
inwords([C]) --> [C], { ok(C) }. %bug: inwords([]) --> [].
% skip blanks (test and lose...)
%
blanks --> [C], { ko(C) }, blanks.
blanks --> [].
ok(C) :- \+ ko(C).
ko(C) :- code_type(C, space).
Bye Carlo
> I had to change one thing to make the munching work, this is
> functional just the way I wanted
Consider DCGs for convenience - for example:
string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
tokens([], Ts) --> token(Ts).
tokens([C|Cs], Ts) -->
( { C == 0' } -> token(Ts), tokens(Cs, [])
; tokens(Cs, [C|Ts])
).
token([]) --> [].
token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
Yielding:
?- string_tokens("this is a test ", ["this", "is", "a", "test"]).
%@ true.
--
comp.lang.prolog FAQ: http://www.logic.at/prolog/faq/
"Carlo" <cc...@tin.it> ha scritto nel messaggio
news:RRjzj.259973$%k.38...@twister2.libero.it...
How about:
A List is a tokenization of a charact sequence separated by a
separator string if every token is (ordered) within the sequence
followed by the separator but the last token.
%tokenized(sting, token list, separator).
tokenized([], [[]], _).
tokenized([C|Cs], [[C|TCs]|Ts], [S|Ss]) :-
C \= S,
tokenized(Cs, [TCs|Ts], [S|Ss]), !.
tokenized([C|Cs], [[]|Ts], [C|Ss]) :-
separated([C|Cs], Ts, [C|Ss], [C|Ss]).
%separated(string, token list, separator, separator).
separated([C|Cs], Ts, [], TempSs) :-
tokenized([C|Cs], Ts, TempSs).
separated([S|Cs], Ts, [S|Ss], TempSs) :-
separated(Cs, Ts, Ss, TempSs).
?- tokenized("test ; string", TokenList, " ; "),
maplist(name, TextList, TokenList).
TokenList = [[116, 101, 115, 116], [115, 116, 114, 105, 110, 103]],
TextList = [test, string]
?- tokenized(String, ["test", "string"], " ; "), name(Text, String).
String = [116, 101, 115, 116, 32, 59, 32, 115, 116|...],
Text = 'test ; string'
Regards
Stephan
> Dustin Kick<mac_vi...@mac.com> writes:
>
> > I had to change one thing to make the munching work, this is
> > functional just the way I wanted
>
> Consider DCGs for convenience - for example:
>
> string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
>
> tokens([], Ts) --> token(Ts).
> tokens([C|Cs], Ts) -->
> ( { C == 0' } -> token(Ts), tokens(Cs, [])
> ; tokens(Cs, [C|Ts])
> ).
>
> token([]) --> [].
> token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
>
> Yielding:
>
> ?- string_tokens("this is a test ", ["this", "is", "a", "test"]).
> %@ true.
string_tokens(Cs, StpS, Ts) :- phrase(tokens(Cs, StpS, []), Ts).
tokens([], _, Ts) --> token(Ts).
tokens([C|Cs], StpS, Ts) -->
% ( { C == 0' } -> token(Ts), tokens(Cs, StpS, [])
( { memberchk(C,StpS) } -> token(Ts), tokens(Cs, StpS, [])
; tokens(Cs, StpS, [C|Ts])
).
token([]) --> [].
token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
Slight mods ...
Dhu
> Dustin Kick<mac_vi...@mac.com> writes:
>
> > I had to change one thing to make the munching work, this is
> > functional just the way I wanted
>
> Consider DCGs for convenience - for example:
>
> string_tokens(Cs, Ts) :- phrase(tokens(Cs, []), Ts).
>
> tokens([], Ts) --> token(Ts).
> tokens([C|Cs], Ts) -->
Just as a matter of interest, what's this C == 0' notation?
Why does 0' evaluate to 32 (space)?
Dhu
> Definite Clause Grammar, of course, it just makes sense.
> If anyone has any ideas how to work difference lists into this, which
I think that in most prolog systems, DCG's get translated into Prolog
code with different lists, see:
http://xsb.sourceforge.net/manual1/node155.html
A DCG rule such as:
p(X) -> q(X).
will be translated (expanded) into:
p(X, Li, Lo) :- q(X, Li, Lo).
> I'm hoping will make it more efficient, and give me a chance to put
> difference lists into practice, I'd appreciate them.
> --
>
> Dustin Kickhttp://homepage.mac.com/mac_vieuxnez
DCG
Many years ago, i wrote an interpreter, and the DCG via this code:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Definite Clause Grammar translator
% from Clocksin, Mellish
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- op(251, fx, { ).
:- op(250, xf, } ).
:- op(255, xfx, -->).
translate((P0 --> Q0), (P :- Q)) :-
left_hand_side(P0, S0, S, P),
right_hand_side(Q0, S0, S, Q1),
flatten(Q1, Q), !.
left_hand_side((NT, Ts), S0, _S, P) :- !,
nonvar(NT),
islist(Ts),
tag(NT, S0, S1, P),
append(Ts, S0, S1).
left_hand_side(NT, S0, S, P) :-
nonvar(NT),
tag(NT, S0, S, P).
right_hand_side((X1, X2), S0, S, P) :- !,
right_hand_side(X1, S0, S1, P1),
right_hand_side(X2, S1, S, P2),
and(P1, P2, P).
right_hand_side((X1 ; X2), S0, S, (P1 ; P2)) :-
or(X1, S0, S, P1),
or(X2, S0, S, P2).
right_hand_side({P}, S, S, P) :- !.
right_hand_side(!, S, S, !) :- !.
right_hand_side(Ts, S0, S, true) :-
islist(Ts),
!, append(Ts, S, S0).
right_hand_side(X, S0, S, P) :-
tag(X, S0, S, P).
or(X, S0, S, P) :-
right_hand_side(X, S0a, S, Pa),
( var(S0a), S0a = S, !, S0 = S0a, ! = Pa;
P = (S0 = S0a, Pa) ).
tag(X, S0, S, P) :-
X =.. [F | A],
append(A, [S0, S], AX),
P =.. [F | AX].
and(true, P, P) :- !.
and(P, true, P) :- !.
and(P, Q, (P, Q)).
flatten(A, A) :-
var(A), !.
flatten((A, B), C) :- !,
flatten1(A, C, R),
flatten(B, R).
flatten(A, A).
flatten1(A, (A, R), R) :-
var(A), !.
flatten1((A, B), C, R) :- !,
flatten1(A, C, R1),
flatten1(B, R1, R).
flatten1(A, (A, R), R).
islist([]) :- !.
islist([_|_]).
append([A|B], C, [A|D]) :- append(B, C, D).
append([], X, X).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% convert DCG rules to clauses
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
d2c :-
clause((H-->T),true),
translate((H-->T), Clause),
assert(Clause),
display(Clause), nl,
fail.
d2c.
It's not so simple..., and indeed some time after i read inSterling-Shapiro
'The Art of Prolog' a simpler approach, maybe matched in SICTus
implementation.
Bye Carlo
>
> Dustin Kick
> http://homepage.mac.com/mac_vieuxnez
>
--
Dustin Kick
http://homepage.mac.com/mac_vieuxnez
> Is there a goal I can run DCGs through to see the expanded code?
Use clause/2 to access its term representation. Also try listing/[01]:
?- listing(tokens).
%@ tokens([], A, B, C) :-
%@ token(A, B, C).
%@ tokens([A|E], C, B, G) :-
%@ ( A==32,
%@ D=B
%@ -> token(C, D, F),
%@ tokens(E, [], F, G)
%@ ; tokens(E, [A|C], B, G)
%@ ).
> Dustin Kick<mac_vi...@mac.com> writes:
>
>> Is there a goal I can run DCGs through to see the expanded code?
>
> Use clause/2 to access its term representation. Also try listing/[01]:
>
> ?- listing(tokens).
> %@ tokens([], A, B, C) :-
> %@ token(A, B, C).
> %@ tokens([A|E], C, B, G) :-
> %@ ( A==32,
> %@ D=B
> %@ -> token(C, D, F),
> %@ tokens(E, [], F, G)
> %@ ; tokens(E, [A|C], B, G)
> %@ ).
> %@ true.
There are two things I can't grok:
1) the %@ : when I do ?- listing(tokens). those weird symbols don't show
up. We are using the same SWI, or not ?
2) why is there 32 in the output, while the original program had 0' ?
is this unavoidable, an SWI bug or an ISO Prolog inconsistency ?
Cheers
Bart Demoen
I leave that to Markus
> 2) why is there 32 in the output, while the original program had 0' ?
> is this unavoidable, an SWI bug or an ISO Prolog inconsistency ?
You know the answer: as it stands in ISO, it is unavoidable. The
tokeniser must translate 0' into the character code of the space. In
general that is even undefined but SWI-Prolog is internally Unicode,
so it is defined as 32, regardless of the locale. characters codes
however are no special type and therefore cannot be distinguished from
integers. I'm not sure whether ISO would allow for a subtype of
integer that represents character codes. Possibly.
Same for [32] and " ", etc. To a certain extend this can be remedied
using ?- set_prolog_flag(double_quotes, chars). It doesn't fix all
issues though, and a global flag that introduces such big
incompatibilities causes more troubles than it solves. I never touch
that flag for any real programming task.
I once raised a similar issues about [] == [ ] == [/*empty list*/] == '[]'
It is fine for the first three to be equal, but I still have doubts on the
latter. Same for {}, though this causes less confusing on practice.
I don't think there is an easy fix to these issues without introducing
serious compatibility issues.
Cheers --- Jan
> 1) the %@ : when I do ?- listing(tokens).
I'm using the default value of ediprolog-prefix ("%@ "), and this is
what gets inserted when I evaluate a query. More information:
http://www.logic.at/prolog/ediprolog/ediprolog.html
An idiosyncratic symbol combination is used to make inserted output
automatically detectable. C-3 F10 flushes answers that were previously
inserted in the buffer, assuming ediprolog-dwim is bound to F10.
% driver: should be the only public member of your module split_string
%
split_string(S, L) :- phrase(split_str(L), S).
% scan a list of words separed by spaces
%
split_str([H|T]) --> blanks, inwords(H), blanks, !, split_str(T).
split_str([], _, _).
% a word is a sequence of (at least one!) not blanks
%
inwords([C|Cs]) --> [C], { ok(C) }, inwords(Cs).
inwords([C]) --> [C], { ok(C) }.
% skip blanks (test and lose...)
%
blanks --> [C], { ko(C) }, blanks.
blanks --> [].
ok(C) :- \+ ko(C).
ko(C) :- code_type(C, space).
Bye Carlo
"Carlo Capelli" <carlo....@rdbos.it> ha scritto nel messaggio
news:iOrzj.6359$q53....@tornado.fastwebnet.it...
% Correction
split_str([H|T]) --> blanks, inwords(H), blanks, !, split_str(T).
split_str([], _, _).