fgets function returning values as python string?

874 views
Skip to first unread message

Francisco Lahuerta Calahorra

unread,
Feb 10, 2015, 5:25:37 AM2/10/15
to cython...@googlegroups.com
HI I trying to build a function that read IO a file and finds the position of a Marker

I would like to do it in cython with the C functions fopen and fgets

but I do not know how to convert the tring that I get in the fgets into a python string.

In the following example a file is read as python and with the c functions, when is 

read as python I find the marker but when a read with the C function I can not find the marker.

Would like to ask how can be done so?


    def _read_seq1 (self, filename, MARKER, n_frame, BLOCK_SIZE):        
       
        filename_byte_string
= filename.encode("UTF-8")
       
        cdef
char* fname = filename_byte_string
        cdef FILE
* cfl
        cfl
= fopen(fname, "rb")
        fl
= open(filename, 'rb')
       
       
if cfl == NULL:
           
return -1, ""
           
        cdef size_t _block_size
= BLOCK_SIZE
        string_val
= "x" * (BLOCK_SIZE + 1)
        cdef
char* block = string_val
       
        current
= ""
        frame_data
= ""
        ii_frame
= 0
        count
= 0
       
        current1
= ""
        block1
= ""
       
       
while True:
           
           
if fgets(block, _block_size, cfl) == NULL: break              
           
            block1
= fl.read(BLOCK_SIZE)
            current1
+= block1
           
            current
+= unicode(block, "ISO-8859-1").encode('utf8') ### How to do this part
           
            markerpos
= current.find( MARKER )
            markerpos1
= current1.find( MARKER )
           
           
if markerpos >= 0: print "Cython", markerpos
           
if markerpos1 >= 0: print "Python", markerpos1, " ", len( current1)
               
            count
= count + 1
           
if count == 5: break
           
       
#fl.close()  
        fclose
(cfl)
        fl
.close()
       
       
return ii_frame, frame_data

 

Nils Bruin

unread,
Feb 10, 2015, 3:11:29 PM2/10/15
to cython...@googlegroups.com
On Tuesday, February 10, 2015 at 2:25:37 AM UTC-8, Francisco Lahuerta Calahorra wrote:

           
            block1
= fl.read(BLOCK_SIZE)
            current1
+= block1
           
            current
+= unicode(block, "ISO-8859-1").encode('utf8') ### How to do this part
           

It looks like you want to *decode* an ISO-8859-1 stream into a python unicode object. Probably PyUnicode_DecodeLatin1 (CPython C-API) does this. Note that "current +=" will create a new object every time, so your complexity is O(<length>^2) rather than the O(<length>) it should be.

You can then of course *encode* your unicode string into a utf8 byte-sequence once you've done that with ".encode('utf8'), but that is not what fl.read does.

Francisco Lahuerta Calahorra

unread,
Feb 11, 2015, 7:43:11 AM2/11/15
to cython...@googlegroups.com
Yes is what fl.read does, but I want to avoid the use of the python fl.read and use fgets instead, that is a C++ function.

However my problem is that the text that I read with fgets is not encoded as the text that I read with fl.read.

and I would like to know how to read with fgets and encode that text as when is read with  fl.read.

Nils Bruin

unread,
Feb 11, 2015, 12:32:30 PM2/11/15
to cython...@googlegroups.com
On Wednesday, February 11, 2015 at 4:43:11 AM UTC-8, Francisco Lahuerta Calahorra wrote:
Yes is what fl.read does, but I want to avoid the use of the python fl.read and use fgets instead, that is a C++ function.

However my problem is that the text that I read with fgets is not encoded as the text that I read with fl.read.

and I would like to know how to read with fgets and encode that text as when is read with  fl.read.

I think you are using the word "encode" for what the python documentation calls "decode" . It might be a good idea to read the python reference section about unicode, and in particular the C-API documentation of PyUnicode_DecodeLatin1, which is almost certainly what you need.

(also: don't top-post).
Reply all
Reply to author
Forward
0 new messages