If I use an integer to represent bits:
e.g. 99 represents '1100011'
How can I locate, say the second bit of 99(i.e. '1')?
Although bin(99)[4] could be used to locate it, this transform cost
too much memory (99 only needs 2Bytes, while string '1100011' needs
7Bytes).
Anyone knows how to locate the second bit without using bin() function?
Thank you very much:D
--
Li
------
Time is all we have
and you may find one day
you have less than you think
You mean
def get_bit(number, bit):
return (number >> bit) & 1
?
-tkc
If we consider 8 bit, a solution may be the follow:
>>> a = 99 # bin(a) = 0b1100011
>>> 0b11111101 & a
97
>>> 0b11101111 & a
99
as you view, you set the nth bit to 0, and view if the result is same
as 'a'. If is same
then the nth bit doesn't set. If you write the above code in a loop,
you can test which bit
you want.
Hi.
So do you know how could I locate the bit in O(1) time? Transform it
into a string is a method, but takes too much space (when I try to
process a 2M file, it used more than 100M memory.....).
Thank you very much.
> -tkc
The my solution is good, but the Tim's solution is better, and it is O
(1) in time and space.
What is you search? I dont'know you general problem, but search the
value of a bit in a 2M file ...
is strange .....
Hi.
I'm not sure I follow here -- your original post said that you
have an integer. With an integer, my function is O(1) as you
request. It sounds like you're *not* representing your bits as
an integer. Either that, or it's a gargantuan number (say, more
than 64 bits? Maybe more than 1k bits?) that you want to
bit-slice. In that case, you need to know a bit more about the
storage structure. However, assuming it's a raw bit-stream, you
might be able to do something like this (untested, but should be
fairly close)
data = file('source.bin').read()
def get_bit(source, bit):
idx, bit = divmod(bit, 8)
byte = ord(source[len(source) - (1+idx)])
return (byte >> bit) & 1
print get_bit(data, 3141592) # the 3,141,592nd bit
you might have to tweak for endian'ness.
-tkc
The function extracts the byte that contains the bit of interest,
and then extracts the corresponding bit from that byte. More
verbosely:
idx, bit = divmod(bit, 8)
# idx is now the Nth byte from the end we want
# bit is now 0-7 for the bit inside the corresponding byte
offset = len(source) - (1+idx)
# convert the left-hand idx to a right-hand idx
byte = ord(source[offset]))
# we now have the byte containing the bit we want
# my original O(1) bit-extract
return (byte >> bit) & 1
it only shifts once because it can precisely target the exact
byte without having to visit the other bytes. Trust me...it's
O(1) for extracting a single bit (unless indexed access to "data"
is not O(1), but that would assume something other than a string
or list data-type...like a data-stream or generator).
-tkc
You omit some key details -- namely how do you know that "1001"
is 4 bits and not "00001001" (8-bits)? If it's a string (as your
current code shows), you can determine the length. However, if
they are actually ints, your code should work fine & be O(1).
This can be abstracted if you need:
def combine_bits(int_a, int_b, bit_len_b):
return (int_a << bit_len_b) | int_b
a = 0x09
b = 0x07
print combine_bits(a, b, 3)
However, if you're using gargantuan ints (as discussed before),
it's a lot messier. You'd have to clarify the storage structure
(a byte string? a python long?)
-tkc
PS: You may want to CC the mailing list so that others get a
crack at answering your questions...I've been adding it back in,
but you've been replying just to me.
So... I can only conclude you are looking for bit x in the entirety of a
file. First you'll have to figure out what byte to look at w/ a little
integer division, then read to that point and test for the specific bit
-- I'm thinking a bitwise and operation with a bit mask. Should be
really fast.
--David
Actually, what I have is a list of integer numbers [3,55,99,44], and
by using Huffman coding or fixed length coding, I will know how the
bits-length for each number. When I try to concatenate them (say
10,000 items in the list) all together, the speed is going down
quickly (because of the shifting operations of python long).
>
> This can be abstracted if you need:
>
> def combine_bits(int_a, int_b, bit_len_b):
> return (int_a << bit_len_b) | int_b
>
> a = 0x09
> b = 0x07
> print combine_bits(a, b, 3)
>
> However, if you're using gargantuan ints (as discussed before), it's a lot
> messier. You'd have to clarify the storage structure (a byte string? a
> python long?)
I am using a single python long to store all the items in the list
(say, 10,000 items), so the work does become messier...
>
> -tkc
>
> PS: You may want to CC the mailing list so that others get a crack at
> answering your questions...I've been adding it back in, but you've been
> replying just to me.
Sorry, this is the first time I am using mail-list....and always
forgot "reply to all"
Thank you very much:D
>
>
>
>
--
Ah, this makes more sense!
I'd try creating a bitwriter object that takes fragments of bytes
and writes them sequentially to a file-like object. Something
like this untested class:
class BitWriter:
def __init__(self, outstream):
self.out = outstream
self.datum = 0 # the data accrued
self.bits = 0 # the number of bits we've accrued
def write(self, byte_data, bit_count):
# maybe add some code to ensure
# that byte_data doesn't have any
# significant bits beyond bit_count
datum = (self.datum << bit_count) | byte_data
self.bits += bit_count
if self.bits >= 8:
overflow = self.bits - 8
self.out.write(chr(datum >> overflow))
#discard the stuff we wrote
datum -= (datum >> overflow) << overflow
self.bits -= 8
def close(self):
if self.bits: # we have unflushed data
data = self.datum << (8 - self.bits)
self.out.write(chr(data))
self.out.close()
out = file('compressed.dat', 'wb')
bw = BitWriter(out)
for data, bit_count in source():
out.write(data, bit_count)
out.close()
As mentioned, it's 100% untested, I don't know what sort of
performance it has, nor the endian'ness of your data streams, so
you might have to twiddle bits in the opposite directions and
test the performance.
-tkc
The method looks great, I thought I got your idea: when the
len(bits)>8, we output 8bits and discard them. I believe this will
work efficiently, because I have implemented the algorithm by
manipulating strings ('111001' character string, not bits) in the
similar way, and worked quite fast.:), Thank you very much.
You mentioned "twiddle bits in the opposite directions", that reminds
me of another possible solutions for this problem. Say, I have bits
'1101' and '110', instead of doing (1101<<3) |110, we could try the
opposite way:
firstly, inverse the '1101' to '1011', and '110' to '011'
then, (011 << 4) | 1011,
if there comes another bit '10100'
inverse it again 10100 to 00101
add it to the long bits: (00101 << 7) | 0111011
so on and so forth.
After all the process is done, we inverse the final long bits
Because in this process you do not need to shift the whole long bits,
but only shift the much shorter bits every time, so the time
efficiency should be not bad.
However, there is one issue I cannot not figure it out: How to inverse
the bits efficiently? do you have any idea?
Thank you very much:!:)
Best regards,
Li
>
> -tkc
> Although bin(99)[4] could be used to locate it, this transform cost too
> much memory (99 only needs 2Bytes, while string '1100011' needs 7Bytes).
This is Python, not C. Your figures are completely wrong.
>>> sys.getsizeof(99)
12
>>> sys.getsizeof('1100011')
31
Everything in Python is an object. If you're expecting C performance for
bit-twiddling operations, you aren't going to get it.
--
Steven
Using GMPY (http://code.google.com/p/gmpy/) may offer a performance
improvement. When shifting multi-thousand bit numbers, GMPY is several
times faster than Python longs. GMPY also support functions to scan
for 0 or 1 bits.
You can then look up a bit in the array trivially. And if you really need a Python long, I predict the single conversion will be much faster than doing all those shifts on the long as you go.