Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

newbe question about removing items from one file to another file

0 views
Skip to first unread message

Eric_...@msn.com

unread,
Aug 27, 2006, 5:35:34 PM8/27/06
to
def simplecsdtoorc(filename):
file = open(filename,"r")
alllines = file.read_until("</CsInstruments>")
pattern1 = re.compile("</")
orcfilename = filename[-3:] + "orc"
for line in alllines:
if not pattern1
print >>orcfilename, line

I am pretty sure my code isn't close to what I want. I need to be able
to skip html like commands from <defined> to <undefined> and to key on
another word in adition to </CsInstruments> to end the routine

I was also looking at se 2.2 beta but didn't see any easy way to use it
for this or for that matter search and replace where I could just add
it as a menu item and not worry about it.

thanks for any help in advance

PetDragon

unread,
Aug 27, 2006, 6:51:14 PM8/27/06
to
Sounds like you need to use html parser, check it out in the
documentation....

<Eric_...@msn.com> wrote in message
news:1156714534.3...@i3g2000cwc.googlegroups.com...

Eric_...@msn.com

unread,
Aug 27, 2006, 8:29:55 PM8/27/06
to

I will look into that a little bit since that is so html like... maybe
some of the examples can lead me in the right direction on alot of it..

http://www.dexrow.com

Simon Forman

unread,
Aug 28, 2006, 2:03:29 AM8/28/06
to

If you're dealing with html or html-like files, do check out
beautifulsoup. I had reason to use it the other day and man is it ever
useful!

Meantime, there are a few minor points about the code you posted:

1) open() defaults to 'r', you can leave it out when you call open() to
read a file.

2) 'file' is a builtin type (it's the type of file objects returned by
open()) so you shouldn't use it as a variable name.

3) file objects don't have a read_until() method. You could say
something like:

f = open(filename)
lines = []
for line in f:
lines.append(line)
if '</CsInstruments>' in line:
break

4) filename[-3:] will give you the last 3 chars in filename. I'm
guessing that you want all but the last 3 chars, that's filename[:-3],
but see the os.path.splitext() function, and indeed the other
functions in os.path too:
http://docs.python.org/lib/module-os.path.html

5) the regular expression objects returned by re.compile() will always
evaluate True, so you want to call their search() method on the data to
search:

if not pattern1.search(line):

But, 6) using re for a pattern as simple as "</" is way overkill. Just
use 'in' or the find() method of strings:

if "</" not in line:

or:

pos = line.find("</")
if pos == -1:
print >>orcfilename, line
else:
print >>orcfilename, line[:pos]

7) the "print >> file" usage requires a file (or file-like object,
anything with a write() method I think) not a string. You need to use
it like this:

orcfile = open(orcfilename, 'w')
#...
print >> orcfile, line

8) If you have a list of lines anyway, you can use the writelines()
method of files to write them in one go:

open(orcfilename, 'w').writelines(lines)

of course stripping out your unwanted data from that last line using
find() as shown above.

I hope this helps.

Check out the docs on file objects:
http://docs.python.org/lib/bltin-file-objects.html, but like I said,
if you're dealing with html or html-like files, be sure to check out
beautifulsoup. Also, there's the elementtree package for parsing XML
that could help here too.

~Simon

Anthra Norell

unread,
Aug 28, 2006, 2:35:25 AM8/28/06
to Python SIG
Eric,
Having played around with problems of this kind for quite some time I find them challenging even if I don't really have time to
get sidetracked. Your description of the problem makes it all the more challenging, because its 'expressionist' quality adds the
challenge of guessing what you mean.
I'd like to take a look at your data, if you would post a segment on which to operate, the same data the way it should look in
the end. In most cases this is pretty self-explanatory. Explain the points that might not be obvious to a reader who knows nothing
about your mission.

Frederic

> --
> http://mail.python.org/mailman/listinfo/python-list

Eric_...@msn.com

unread,
Aug 28, 2006, 4:48:01 AM8/28/06
to

sorry about that this is a link to a discription of the format
http://kevindumpscore.com/docs/csound-manual/commandunifile.html
It is possible to have more than one instr defined in an .csd file so I
would need to look for that string also if I want to seperate the
instruments out.

http://www.dexrow.com

Gabriel Genellina

unread,
Aug 28, 2006, 8:56:55 PM8/28/06
to Eric_...@msn.com, pytho...@python.org
At Sunday 27/8/2006 18:35, Eric_...@msn.com wrote:

(This code don't even compile...!)

>def simplecsdtoorc(filename):
> file = open(filename,"r")

file is not a good name - hides the builtin type of the same name.
Same for dict, list...

> alllines = file.read_until("</CsInstruments>")

read_until???

> pattern1 = re.compile("</")
> orcfilename = filename[-3:] + "orc"

perhaps you want filename[:-3]+"orc"?

> for line in alllines:
> if not pattern1

if not pattern1.search(line):

> print >>orcfilename, line

Open the output file before the loop, and use its write() method here

>I am pretty sure my code isn't close to what I want. I need to be able
>to skip html like commands from <defined> to <undefined> and to key on
>another word in adition to </CsInstruments> to end the routine

Good job for Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/

Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Anthra Norell

unread,
Aug 29, 2006, 10:00:08 AM8/29/06
to Python SIG
Dexter,

I looked at the format specification. It contains an example:

-----------------------------------------------

<CsoundSynthesizer>;
; test.csd - a Csound structured data file

<CsOptions>
-W -d -o tone.wav
</CsOptions>

<CsVersion> ;optional section
Before 4.10 ;these two statements check for
After 4.08 ; Csound version 4.09
</CsVersion>

<CsInstruments>
; originally tone.orc
sr = 44100
kr = 4410
ksmps = 10
nchnls = 1
instr 1
a1 oscil p4, p5, 1 ; simple oscillator
out a1
endin
</CsInstruments>

<CsScore>
; originally tone.sco
f1 0 8192 10 1
i1 0 1 20000 1000 ;play one second of one kHz tone
e
</CsScore>

</CsoundSynthesizer>

-------------------------------------

If I understand correctly you want to write the instruments block to a file (from <CsInstruments> to </CsInstruments>)? Right? Or
each block to its own file in case there are several?. You want your code to generate the file names? Can you confirm this or
explain it differently?

Regards

Frederic

> --
> http://mail.python.org/mailman/listinfo/python-list

Eric_...@msn.com

unread,
Aug 29, 2006, 7:51:45 PM8/29/06
to

I need to take it between the blocks only I also need to make sure I
only take one instrument
defined in this example with the code instr 1 I also need the code

<CsInstruments>
> ; originally tone.orc
> sr = 44100
> kr = 4410
> ksmps = 10
> nchnls = 1

regardless of what instrument I take. The function would have to
except the instrument number as an argument

http://www.dexrow.com

Simon Forman

unread,
Aug 29, 2006, 8:32:32 PM8/29/06
to

Using BeautifulSoup and the interactive interpreter, I figured out the
following script in about 15 minutes:

# s is a string containing the example file from above.

from BeautifulSoup import BeautifulStoneSoup

soup = BeautifulStoneSoup(s)
csin = soup.contents[0].contents[5]
lines = csin.string.splitlines()

print csin.string

It prints:

; originally tone.orc
sr = 44100
kr = 4410
ksmps = 10
nchnls = 1
instr 1
a1 oscil p4, p5, 1 ; simple oscillator
out a1
endin


and of course you could say "lines = csin.string.splitlines()" to get a
list of the lines. That doesn't take you all the way, but it's
something.

Hope that helps,
Peace,
~Simon

Anthra Norell

unread,
Aug 30, 2006, 5:41:20 AM8/30/06
to Python SIG
Dexter,

Here's a function that screens out all instrument blocks and puts them into a dictionary keyed on the instrument number:

--------------------------------------------

def get_instruments (file_name):

INSIDE = 1
OUTSIDE = 0

f = file (file_name, 'ra')
state = OUTSIDE
instruments = {}
instrument_segment = ''

for line in f:
if state == OUTSIDE:
if line.startswith ('<CsInstruments'):
state = INSIDE
instrument_segment += line
else:
instrument_segment += line
if line.lstrip ().startswith ('instr'):
instrument_number = line.split () [1]
elif line.startswith ('</CsInstruments'):
instruments [instrument_number] = instrument_segment
instrument_segment = ''
state = OUTSIDE

f.close ()
return instruments

------------------------------------------------

You have received good advice on using parsers: "beautiful soup" or "pyparse". These are powerful tools capable of doing complicated
extractions. Yours is not a complicated extraction. Simon tried it with "beautiful soup". That seems simple enough, though he finds
the data by index leaving open where he gets the index from. There's surely a way to get the data by name.
Contrary to the parser the function will miss if tags take liberties with upper-lower case letters as they are probably
allowed by the specification. A regular expression might have to be used, if they do.
From your description I haven't been able to infer what the final format of your data is supposed to be. So I cannot tell you
how to go on from here. You'll find out. If not, just keep asking.

The SE solution which you said couldn't work out would be the following. It makes the same dictionary the function makes and it is
case-insensitive:

------------------------------------------------

>>> Instrument_Segment_Filter = SE.SE ('<EAT> "~(?i)<CsInstruments>(.|\n)*?</CsInstruments>~==\n\n" ')
>>> instrument_segments= Instrument_Segment_Filter ('file_name', '')
>>> print instrument_segments
(... see all instrument segments ...)
>>> Instrument_Number = SE.SE ('<EAT> ~instr.*~==\n')
>>> instruments ={}
>>> for segment in instrument_segments.split ('\n\n'):
if segment:
instr_line = Instrument_Number (segment)
instrument_number = instr_line.split ()[1]
instruments [instrument_number] = segment

--------------------------------------------------

(If you're on Windows and the CRs bother you, take them out with an additional definition when you make your
Instrument_Block_Filter: (13)= or "\r=")


Regards

Frederic


----- Original Message -----
From: <Eric_...@msn.com>
Newsgroups: comp.lang.python
To: <pytho...@python.org>
Sent: Wednesday, August 30, 2006 1:51 AM
Subject: Re: newbe question about removing items from one file to another file


>
> Anthra Norell wrote:
> > Dexter,
> >
> > I looked at the format specification. It contains an example:
> >
> > -----------------------------------------------
> >
> > <CsoundSynthesizer>;
> > ; test.csd - a Csound structured data file
> >
> > <CsOptions>
> > -W -d -o tone.wav
> > </CsOptions>
> >

...
etc.

Eric_...@msn.com

unread,
Aug 30, 2006, 8:18:09 PM8/30/06
to

Thanks for the help I can't wait to try it out.. (has to wait for the
weekend.. three days off finaly.)

http://www.dexrow.com

Eric_...@msn.com

unread,
Sep 3, 2006, 10:50:19 PM9/3/06
to

I seem to be having problems getting the code to work.. Seems to crash
my whole project, I don't know if I am missing an import file or what
(I had to go back to an older version on my hd.. I have uploaded what
I have on to sourceforge

https://sourceforge.net/project/showfiles.php?group_id=156455&package_id=201306&release_id=444362
http://www.dexrow.com

thanks for the help

Eric_...@msn.com

unread,
Sep 3, 2006, 10:55:45 PM9/3/06
to

sorry I responded to the wrong post... I was having trouble figuring
out the buitiful soup download

Eric_...@msn.com

unread,
Sep 3, 2006, 10:58:39 PM9/3/06
to

I cut and pasted this.. It seems to be crashing my program.. I am not
sure that I have all the right imports.. seems to be fine when I go to
an older version of the file... I uploaded it onto source forge.

https://sourceforge.net/project/showfiles.php?group_id=156455&package_id=201306&release_id=444362
http://www.dexrow.com

Anthra Norell

unread,
Sep 4, 2006, 3:28:40 PM9/4/06
to Python SIG

----- Original Message -----
From: <Eric_...@msn.com>
Newsgroups: comp.lang.python
To: <pytho...@python.org>
Sent: Monday, September 04, 2006 4:58 AM
Subject: Re: newbe question about removing items from one file to another file


>
> Anthra Norell wrote:
> > Dexter,
> >


> > Here's a function that screens out all instrument blocks and puts them into a dictionary keyed on the instrument number:
> >
> > --------------------------------------------
> >
> > def get_instruments (file_name):

etc.

> > > <CsOptions>
> > > > -W -d -o tone.wav
> > > > </CsOptions>
> > > >
> > ...
> > etc.
>
> I cut and pasted this.. It seems to be crashing my program.. I am not
> sure that I have all the right imports.. seems to be fine when I go to
> an older version of the file... I uploaded it onto source forge.
>
> https://sourceforge.net/project/showfiles.php?group_id=156455&package_id=201306&release_id=444362
> http://www.dexrow.com
>

Eric (Eric or Dexer?)
This thread seems to have split. So let me reiterate: please copy the output when you cut, paste and run. If you have an
import problem it must be on the other side of your interface with SE, because I don't import anything and SE imports what it needs.

Frederic


Eric_...@msn.com

unread,
Sep 4, 2006, 4:52:02 PM9/4/06
to
I am have to be able to distribute se with the project in order to use
it
I started with import se but I did not use the setup command
when I comment out import se the program works and when
I use import se everything connected to the library crashes on the
import line..

Anthra Norell

unread,
Sep 5, 2006, 12:24:07 PM9/5/06
to Python SIG
You don't need the setup command. Just place SE.py and SEL.py into a path where the import can find it. Also make sure SE.py and
SEL.py are spelled exactly like this. Linux requires the extension to be lower case, as I was myself made aware of by an alert
person who was also experiencing import problems. I must confess that my fist uploads were upper case (SE.PY). I instantaneously
replaced the upload with corrected spelling and apologize for the trouble the mistake may be causing. Fortunately correcting it is a
small matter.
Have you tried to run the function at all? It produces the same result. Made case-insensitive (if need be) I'd prefer the
function. It is more economical, since it doesn't require an extra import. It surely runs faster too (if that matters).

Frederic

> --
> http://mail.python.org/mailman/listinfo/python-list

0 new messages