maximum length for value

128 views
Skip to first unread message

andreea...@gmail.com

unread,
May 23, 2013, 1:20:20 PM5/23/13
to scal...@googlegroups.com
Hi,

I am trying to add files to scalaris running on a cluster (the key being the name of the file, 
while the value is the content of the file). Is there a maximum length for the value? 

When I have a 2MB file, the write is being performed, but if I have a 2.9MB file, the write 
failes with the error "failed with connection error". I am using Python to add into the 
datastore.

Thank you,
Andreea

Andreea Sandu

unread,
May 24, 2013, 11:28:58 AM5/24/13
to scal...@googlegroups.com, Magnus Müller
Hi,

I modified the value and it still fails. I tried it with both nolimit and 33554432 (and a 2MB file),
but it still does not work. 
Could it be because I am using Python (and not Ruby or Java)?

This is (part of) my code:

        sc = TransactionSingleOp()
        key = str(input_file)

        try:
            f = open(input_file, 'r')
            value = f.read()
            f.close()
           
            sc.write(key, value)

        [...]

Thank you,
Andreea

2013/5/24 Magnus Müller <mamu...@informatik.hu-berlin.de>
Hey.

The Python and Ruby clients both use the REST API provided by yaws.
When writing data, a post request is issued. The size of accepted post
requests is restricted by the parameter `yaws_max_post_data`, which can
be configured in either `bin/scalaris.cfg` or (preferably) in
`bin/scalaris.local.cfg`. When I set that value to 33554432 (bytes),
writing a 3 MB file using the Ruby API works flawless:

sc = Scalaris::TransactionSingleOp.new
sc.write("hello", "0" * 1024**2* 3)
# => true

I didn't try that with Python yet, but it should work as well.


Magnus



On Thu, 23 May 2013 10:20:20 -0700 (PDT)
andreea...@gmail.com wrote:

>>Hi,
>>
>>I am trying to add files to scalaris running on a cluster (the key
>>being the name of the file,
>>while the value is the content of the file). Is there a maximum
>>length for the value?
>>
>>When I have a 2MB file, the write is being performed, but if I have a
>>2.9MB file, the write
>>failes with the error "f*ailed with connection error". *I am using

>>Python to add into the
>>datastore.
>>
>>Thank you,
>>Andreea
>>
>>--
>>You received this message because you are subscribed to the Google
>>Groups "scalaris" group. To unsubscribe from this group and stop
>>receiving emails from it, send an email to scalaris
>>+unsub...@googlegroups.com. To post to this group, send email to
>>scal...@googlegroups.com. Visit this group at
>>http://groups.google.com/group/scalaris?hl=en. For more options,
>>visit https://groups.google.com/groups/opt_out.
>>
>>

Magnus Müller

unread,
May 25, 2013, 5:49:05 AM5/25/13
to scal...@googlegroups.com
The following works for me. Could you try the same procedure?

========================================
# ./bin/scalaris.cfg
{yaws_max_post_data, nolimit}. % works with 33554432 as well

# prepare test data (3MB file)
cat /dev/urandom| tr -dc 'a-zA-Z0-9' | head -c `echo 2\^22 | bc` \
> test.dat

# write the data with python
from scalaris import TransactionSingleOp
sc = TransactionSingleOp()

f = open("test.dat", "r")
value = f.read()
f.close

sc.write("hello", value)
sc.read("hello")
========================================

Which error message do you receive? If I set the parameter
yaws_max_post_data to a smaller value than the file size, YAWS
complains with `Unhandled reply fr. do_recv() {error,emsgsize}`.


Magnus

andreea...@gmail.com

unread,
May 25, 2013, 10:46:39 AM5/25/13
to scal...@googlegroups.com
The example you gave works for me too. But if I put in the test.data file 8MB (or more), it
does not work. I noticed that the accepted size is also dependent on the \n (if I have
many lines, smaller files will be accepted). I also tried it with Ruby, and I get the 
following error:

Python: 

  File "./scalaris_post.py", line 51, in <module>
    main()
  File "./scalaris_post.py", line 48, in main
    write_value_to_scalaris(sys.argv[i])
  File "./scalaris_post.py", line 29, in write_value_to_scalaris
    sc.write(key, value)#compressed)
  File "/mnt/test/montage_test/scripts/scalaris.py", line 850, in write
    result = self._conn.callp('/api/tx.yaws', 'write', [key, value])
  File "/mnt/test/montage_test/scripts/scalaris.py", line 49, in callp
    return self.call(function, params, path = path, retry_if_bad_status = retry_if_bad_status)
  File "/mnt/test/montage_test/scripts/scalaris.py", line 89, in call
    raise ConnectionError(data, response = response, error = instance)
scalaris.ConnectionError: error: error(111, 'Connection refused')

Ruby:
./scalaris.rb:207:in `call': end of file reached (Scalaris::ConnectionError)
from ./scalaris.rb:616:in `write'
from ./scalaris_client.rb:33:in `write'
from ./scalaris_client.rb:116

Sometimes, after these errors the scalaris node is also killed and I have to restart it.

Thank you,
Andreea

Magnus Müller

unread,
May 29, 2013, 5:46:50 PM5/29/13
to scal...@googlegroups.com, andreea...@gmail.com

>>Python:
>>
>> File "./scalaris_post.py", line 51, in <module>
>> main()
>> File "./scalaris_post.py", line 48, in main
>> write_value_to_scalaris(sys.argv[i])
>> File "./scalaris_post.py", line 29, in write_value_to_scalaris
>> sc.write(key, value)#compressed)
>> File "/mnt/test/montage_test/scripts/scalaris.py", line 850, in
>> write result = self._conn.callp('/api/tx.yaws', 'write', [key,
>> value]) File "/mnt/test/montage_test/scripts/scalaris.py", line 49,
>> in callp return self.call(function, params, path = path,
>> retry_if_bad_status =
>>retry_if_bad_status)
>> File "/mnt/test/montage_test/scripts/scalaris.py", line 89, in call
>> raise ConnectionError(data, response = response, error = instance)
>>scalaris.ConnectionError: error: error(111, 'Connection refused')

This is thrown by here:

except Exception as instance:
#print 'HTTP STATUS:', response.status, response.reason, params_json
self.close()
raise ConnectionError(data, response = response, error =
instance)

I can't make much out of the error message. Maybe somebody else can
jump in here? I am not sure which part of the try statement is
responsible for the thrown error.

>>Ruby:
>>./scalaris.rb:207:in `call': end of file reached
>>(Scalaris::ConnectionError) from ./scalaris.rb:616:in `write'
>>from ./scalaris_client.rb:33:in `write'
>>from ./scalaris_client.rb:116

Can you give a minimal example for this? I am unsure under which cases
this error is raised. My guess is that YAWS stops the upload and the
produced error code results in Ruby's net/http library throwing that
error. I guess that the file is too large and we should use
HTTP streaming instead of a bulk upload, but I lack the experience with
both APIs to say anything substantial on that matter.


Cheers, Magnus

Nico Kruber

unread,
Jul 4, 2013, 6:14:38 AM7/4/13
to scal...@googlegroups.com
actually, the error in this case is because the default timeout in the Python-
API is set too short for your file to be uploaded - I just committed a patch
which uses socket.getdefaulttimeout() instead of our 5s and this was enough
for me to upload 16MB. Python then closed the connection as it did not receive
a response within the default 5s.

You can always tune the timeout to your needs:
=============
from scalaris import TransactionSingleOp, JSONConnection
sc = TransactionSingleOp(JSONConnection(timeout=30))

f = open("test.dat", "r")
value = f.read()
f.close

sc.write("hello", value)
sc.read("hello")
=============


> >>Ruby:
> >>./scalaris.rb:207:in `call': end of file reached
> >>(Scalaris::ConnectionError) from ./scalaris.rb:616:in `write'
> >>from ./scalaris_client.rb:33:in `write'
> >>from ./scalaris_client.rb:116
>
> Can you give a minimal example for this? I am unsure under which cases
> this error is raised. My guess is that YAWS stops the upload and the
> produced error code results in Ruby's net/http library throwing that
> error. I guess that the file is too large and we should use
> HTTP streaming instead of a bulk upload, but I lack the experience with
> both APIs to say anything substantial on that matter.

the same applies here as in Ruby we also set a default timeout of 5s. I made
the same changes to our Ruby-API. The timeout can be set accordingly though.


One general warning though:
Scalaris is not optimised for big values and I saw big (temporary) memory
allocations when I tried to read my 16MB test file back from Scalaris.
Also keep in mind that we store 4 replicas so every size is multiplied by 4!
We do compress values internally though...

Nico
signature.asc
Reply all
Reply to author
Forward
0 new messages