bitarray implementation

657 views
Skip to first unread message

Carlo Baldassi

unread,
Apr 9, 2012, 7:42:07 PM4/9/12
to juli...@googlegroups.com
I have written a nearly-complete implementation of bitarrays (https://github.com/carlobaldassi/julia/blob/bitarray/extras/bitarray.jl) which pack bits in contiguous Uint64 chunks and expose the same functionality as standard Arrays. I tried to keep the code as efficient as I could (especially the dequeue functionality parts, and except for some operations which still have some performance-related TODO's).

Test suite is here: https://github.com/carlobaldassi/julia/blob/bitarray/test/bitarray.jl

In case there's an interest, I can issue a pull request; however, some implementation details may be questionable, so I suppose the dev list is the place for discussing that. Apart from the basics of the implementation itself, which is obviously up for discussion, most notable points are:

1) BitArray{N} is declared as being a subtype of AbstractArray{Int, N}. Another choice could have been using Bool instead of Int, I suppose.
2) I tried to port all kinds of operations available to Arrays; as a guideline, the result is always a BitArray whenever it is safe to assume that a BitArray can hold the result, while promotion to Array{Int} happens otherwise. For example, given two BitArrays b1 and b2:

~b1 will give a BitArray
-b1 will give an Array{Int}
b1 & b2 will give a BitArray
b1 .* b2 will give a BitArray
b1 + b2 will give an Array{Int}
b1 * b2 will give an Array{Int}
etc.

this may become tricky in some cases. A different approach could have been restricting the available operations and requiring explicit conversions in all other cases. Yet another approach could have been emulating Array{Bool} behaviour (see point 1). Yet another one could have been working in modulo 2.

I cut it short here; more minor points can be discussed if needed.

Viral Shah

unread,
Apr 10, 2012, 12:43:11 AM4/10/12
to juli...@googlegroups.com
That's awesome. We have often talked about having a bitarray implementation, and it's nice to have one. Do send it to extras when you think it is ready for that.

-viral

Stefan Karpinski

unread,
Apr 10, 2012, 1:20:16 AM4/10/12
to juli...@googlegroups.com
This is wonderful. One lovely thing about it is that implementing didn't take any sort of crazy voodoo — just a very impressive amount of very solid but voodoo-free coding. Please do open a pull request for this. I want to start work on banging out bugs and integrating it more deeply. And implanting the fast transpose from Hacker's Delight if Jeff doesn't beat me to it ;-)

Jeff Bezanson

unread,
Apr 10, 2012, 1:55:19 AM4/10/12
to juli...@googlegroups.com
Needs more voodoo! :)
del(BitVector, Range1) is especially impressive.
There is a fair amount of copied code; I wonder if we need to
introduce a DenseArray type between AbstractArray and Array?

I have a bunch of C code we can reuse for fast ranged indexing,
reversing, and copy_to with offsets.

If we can extend this to packed arrays of n-bit data types (especially
n < 16), those who like to sequence genes might be interested, though
of course it is non-trivial.

Carlo Baldassi

unread,
Apr 10, 2012, 4:38:34 AM4/10/12
to juli...@googlegroups.com
Ok, pull request issued.
Of course, one can call 'make bitarray' from within test to verify
everything's working. Testing is fairly slow but I tried to be
thorough given the amount of pitfalls which are there, waiting between
the bits.

About copied code: that's right; the main reason was that I couldn't
make BitArray be a StridedArray without touching array.jl. This could
create some ugly maintenance issues, and I think DenseArray may make
sense; if nothing else, to have automatic promotions of BitArrays.

About voodoo: there's only a little really, in nnz. I agree we need
moar of that :)

copy_to with offsets has a partial implementation in _jl_copy_chunks
but that could be generalized (currently it copies the whole source
array to some position in a target array). It was in my todo list,
mainly for speeding up concatenation.

del with Range1 was a nightmare. It could probably be done more
elegantly, too, but at least it works fine.

One more issue, about function naming: I have used the prefix bit- for
BitArray-specific operations, e.g. bitzeros, bitones, bitrand etc. I
think this makes sense, but there's a potential confusion between
bitrand and randbit (I surely do get confused); I didn't have any good
idea about how to solve this naming clash. Also, it's impossible to
distinguish a BitVector from a Vector in the REPL by just showing it.

Patrick O'Leary

unread,
Apr 10, 2012, 8:27:45 AM4/10/12
to juli...@googlegroups.com

On Tuesday, April 10, 2012 3:38:34 AM UTC-5, Carlo Baldassi wrote:
Ok, pull request issued.
Of course, one can call 'make bitarray' from within test to verify
everything's working. Testing is fairly slow but I tried to be
thorough given the amount of pitfalls which are there, waiting between
the bits.

About copied code: that's right; the main reason was that I couldn't
make BitArray be a StridedArray without touching array.jl. This could
create some ugly maintenance issues, and I think DenseArray may make
sense; if nothing else, to have automatic promotions of BitArrays.

About voodoo: there's only a little really, in nnz. I agree we need
moar of that :)

copy_to with offsets has a partial implementation in _jl_copy_chunks
but that could be generalized (currently it copies the whole source
array to some position in a target array). It was in my todo list,
mainly for speeding up concatenation.

del with Range1 was a nightmare. It could probably be done more
elegantly, too, but at least it works fine.

One more issue, about function naming: I have used the prefix bit- for
BitArray-specific operations, e.g. bitzeros, bitones, bitrand etc. I
think this makes sense, but there's a potential confusion between
bitrand and randbit (I surely do get confused); I didn't have any good
idea about how to solve this naming clash. Also, it's impossible to
distinguish a BitVector from a Vector in the REPL by just showing it.

You could modify show.jl either to add a call to summary(v) in function show(v::AbstractVector) or provide a BitVector specific dispatch for show(bv::BitVector).

Carlo Baldassi

unread,
Apr 10, 2012, 9:47:48 AM4/10/12
to juli...@googlegroups.com
On Tue, Apr 10, 2012 at 2:27 PM, Patrick O'Leary
<patrick...@gmail.com> wrote:
> You could modify show.jl either to add a call to summary(v) in function
> show(v::AbstractVector) or provide a BitVector specific dispatch for
> show(bv::BitVector).

Ah yes, the wording of my last sentence was unfortunate to say the
least. It should read as: given the current convention for showing
Vectors, it is impossible to be sure of the element type (e.g. Int64
vs Int32 vs. Integer) and this gets even worse with BitVectors, as one
cannot even see the container type. This makes so that bitrand and
randbit produce indistiguishable outputs in the REPL. So the course of
action would either be (matching Patrick's suggestions):
1) have BitArrays move away from the convention, e.g. by using a
summary, or by prepending something to the square brackets ->
specialized show(::Bitvector)
2) change the convention and use a summary in all cases -> change
show(::AbstractVector)
3) ignore the issue

I personally favour option number 2, which could also show the length
of the vector, just as is done for all other arrays, but I don't know
the reason for the different behaviour, or if there were previous
discussions about that.

Patrick O'Leary

unread,
Apr 10, 2012, 9:59:09 AM4/10/12
to juli...@googlegroups.com
On Tuesday, April 10, 2012 8:47:48 AM UTC-5, Carlo Baldassi wrote:
(2) sounds like a good approach to me. I'd be happy with it.
 

Jeff Bezanson

unread,
Apr 10, 2012, 1:31:48 PM4/10/12
to juli...@googlegroups.com
I think a good option is to add the summary() to repl_show, so you
just get it for top-level results that are vectors. Otherwise
structures with vectors inside like {[1,2], [3,4]} become impossible
to read.

Stefan Karpinski

unread,
Apr 11, 2012, 3:20:17 PM4/11/12
to juli...@googlegroups.com
On Tue, Apr 10, 2012 at 4:38 AM, Carlo Baldassi <carlob...@gmail.com> wrote:
 
One more issue, about function naming: I have used the prefix bit- for BitArray-specific operations, e.g. bitzeros, bitones, bitrand etc. I think this makes sense, but there's a potential confusion between bitrand and randbit (I surely do get confused); I didn't have any good idea about how to solve this naming clash.

I have a solution: make them the same thing. As it is, one returns a random BitArray and the other returns a random binary array of Ints. Seems redundant to me.
 
Also, it's impossible to distinguish a BitVector from a Vector in the REPL by just showing it.

Problem solved by yesterday's controversial commit:

julia> bitrand(5)
5-element BitArray:
 0
 1
 1
 0
 1

julia> randbit(5)
5-element Int64 Array:
 0
 1
 0
 1
 0

At the very least, I think that showing vectors like this should stay this way.

Carlo Baldassi

unread,
Apr 13, 2012, 6:45:18 AM4/13/12
to juli...@googlegroups.com
On Wed, Apr 11, 2012 at 9:20 PM, Stefan Karpinski <ste...@karpinski.org> wrote:
> I have a solution: make them the same thing. As it is, one returns a random
> BitArray and the other returns a random binary array of Ints. Seems
> redundant to me.

What if we wait for the time when namespaces will be available
instead, and see how this is going to work out? My perplexity arises
because I suppose it's likely bitarrays will stay out of base and be
used under special circumstances; therefore, removing the current
randbit (and making it an alias to bitrand) is going to force loading
bitarray.jl (perhaps just for that).

Stefan Karpinski

unread,
Apr 13, 2012, 10:15:20 AM4/13/12
to juli...@googlegroups.com
Ah, but I think bit array is going straight into base as soon as it's well enough integrated and tested. I've always wanted compact Boolean arrays — why wouldn't we use them all the time? They give 8x improvements in both storage and time. We just need to ensure that core operations that make Boolean arrays generate them. E.g. v == w and such, where v and w are vectors needs to make a bitvector.

Carlo Baldassi

unread,
Apr 14, 2012, 9:09:52 AM4/14/12
to juli...@googlegroups.com
On Fri, Apr 13, 2012 at 4:15 PM, Stefan Karpinski
<stefan.k...@gmail.com> wrote:
> Ah, but I think bit array is going straight into base as soon as it's well enough integrated and tested. I've always wanted compact Boolean arrays — why wouldn't we use them all the time? They give 8x improvements in both storage and time. We just need to ensure that core operations that make Boolean arrays generate them. E.g. v == w and such, where v and w are vectors needs to make a bitvector.

Ok, let's wait for bitarray to get into base then before making
randbit be an alias to bitrand (or remove it entierly).
BTW the usage of bitarrays to store Bool values was one of the reasons
I was not 100% sure of BitArray <: AbstractArray{Int} and all its
consequences I mentioned in the first email in this thread.
Maybe we should have BitArrays with type then, possibly restricting
the type to integers? I must admit, I never thought about it before,
but I think it makes sense for dispatch, even if the internal
representation is of course the same.

Kevin Squire

unread,
May 1, 2012, 9:56:11 PM5/1/12
to juli...@googlegroups.com
Hi there,

I'd like to use bitarrays in a project I'm working on, but I've run into a couple of minor issues:


2) I tried to port all kinds of operations available to Arrays; as a guideline, the result is always a BitArray whenever it is safe to assume that a BitArray can hold the result, while promotion to Array{Int} happens otherwise. For example, given two BitArrays b1 and b2:

~b1 will give a BitArray

As of commit 539f9b121a7f243ecf4a921fa8296e5381fe9677, which added types to BitArrays, this doesn't seem to be true (or at least it doesn't work how I would expect):

julia> a = bitzeros(1)
1-element Int64 BitArray:
 0

julia> ~a
1-element Int64 Array:
 -1 

Perhaps if I back it with a Uint...

 julia> a = bitzeros(Uint8, 1)
1-element Uint8 BitArray:
 0x00

julia> ~a
1-element Uint8 Array:
 0xff

Not quite what I was expecting.  It does, of course, work with Bool, but then we're restricted to an 8-bit underlying representation.  Would it be okay to go back to the old (~) function (currently specialized to Bool), which explicitly flipped the underlying bits?  Or is the current behavior of (~) intentional?

I also need logical shift functions (>>>) and (<<).  If you (Carlo) haven't thought these through, I'll attempt an implementation and generate a pull request, if that's okay.

Cheers!

   Kevin

Carlo Baldassi

unread,
May 2, 2012, 4:31:29 AM5/2/12
to juli...@googlegroups.com
Sorry but I don't understand this remark about being restricted to an
8-bit underlying representation for Bools (BitArrays always use the
same representation, type is only used for dispatch).

> Would it be
> okay to go back to the old (~) function (currently specialized to Bool),
> which explicitly flipped the underlying bits?  Or is the current behavior of
> (~) intentional?

The behaviour of ~ is intentional and mimics what happens with Arrays
of the same type, which is the main principle I followed when
introducing types in BitArrays. Therefore, ~ only flips bits (and
returns a BitArray) with BitArray{Bool}, in all other cases it returns
Arrays. One way to get the Bool's behaviour on other bitarray types is
using reinterpret, e.g. you could define:

flipbits{T}(b::BitArray{T}) =
reinterpret(T, ~reinterpret(Bool, b))

(note: convert would produce the same result but it copies the data,
reinterpret doesn't).

Another option could be xor'ing with a bitarray of ones, but that
would be slower I think.

> I also need logical shift functions (>>>) and (<<).  If you (Carlo) haven't
> thought these through, I'll attempt an implementation and generate a pull
> request, if that's okay.

That's a good idea, and I don't currently have plans for it (probably,
circular shift functions would be useful as well).


> Cheers!
>
>    Kevin

Kevin Squire

unread,
May 2, 2012, 5:28:20 PM5/2/12
to juli...@googlegroups.com
 
> Not quite what I was expecting.  It does, of course, work with Bool, but
> then we're restricted to an 8-bit underlying representation.

Sorry but I don't understand this remark about being restricted to an
8-bit underlying representation for Bools (BitArrays always use the
same representation, type is only used for dispatch).

Okay, got it.  I had interpreted the types as the underlying representation.  Which, now that I think about it, doesn't make much sense.  So the actual interpretation is as an array of the underlying type, which takes on values of {0,1} or {true, false}, and when used in ways which would go outside of this range, is either upcast or throws an error.  Is this correct?

I can see where this would be useful, e.g., in setting masks for arrays.  Do you have other use cases?
 
> Would it be
> okay to go back to the old (~) function (currently specialized to Bool),
> which explicitly flipped the underlying bits?  Or is the current behavior of
> (~) intentional?

The behaviour of ~ is intentional and mimics what happens with Arrays
of the same type, which is the main principle I followed when
introducing types in BitArrays. Therefore, ~ only flips bits (and
returns a BitArray) with BitArray{Bool}, in all other cases it returns
Arrays. One way to get the Bool's behaviour on other bitarray types is
using reinterpret, e.g. you could define:

flipbits{T}(b::BitArray{T}) =
       reinterpret(T, ~reinterpret(Bool, b))

(note: convert would produce the same result but it copies the data,
reinterpret doesn't).

Another option could be xor'ing with a bitarray of ones, but that
would be slower I think.

For the mask use case, I think that it would be good to have an invert (or flipbits) function (which I would still argue should be handled by (~), but which could be a separate function).  

Some further thoughts....  I found it unexpected that the default element type is an Int64 (or Int32) represented as a single bit.  It seems that your target use case is in interaction with arrays, and not as a long bitstring, which is my use case.  I also don't really want an array of Bools (true and false), I really want an array of Bits (0s and 1s), even if they happen to have the same underlying type.  

Given all of this, here are some thoughts
  • implement a Bit type, which is virutually identical to Bool, but which semantically allows me to treat a BitArray as an array of bits
  • alternatively, referring back to Jeff's comments, extend BitArray (or implement separately) a PackedArray type, which would allow arrays of (currently non-existent) Uint4, Uint2, Uint1, etc..  Bit could be an alias for Uint1. 
(~) would also work as I expect for these cases.

Any thoughts on these?

In my case, I'm actually trying to represent genetic sequence as an array of bits.  The current implementation is in C++ and uses 2 bitsets to encode values.  I'd much rather have a packed array of Uint2 or Uint4.  
 
> I also need logical shift functions (>>>) and (<<).  If you (Carlo) haven't
> thought these through, I'll attempt an implementation and generate a pull
> request, if that's okay.

That's a good idea, and I don't currently have plans for it (probably,
circular shift functions would be useful as well).
 
Okay, will work on it.

Cheers,

   Kevin

Carlo Baldassi

unread,
May 2, 2012, 7:46:22 PM5/2/12
to juli...@googlegroups.com
> Okay, got it.  I had interpreted the types as the underlying representation.
>  Which, now that I think about it, doesn't make much sense.  So the actual
> interpretation is as an array of the underlying type, which takes on values
> of {0,1} or {true, false}, and when used in ways which would go outside of
> this range, is either upcast or throws an error.  Is this correct?

Yes, that is correct.

> I can see where this would be useful, e.g., in setting masks for arrays.  Do
> you have other use cases?

I'd say the main purpose is compressed storage; the addition of types
lets you specialize the behaviour to whatever you wish (e.g. Bools and
other Integers already have different behaviours, user-defined types
could be made to act differently if so desired).

> For the mask use case, I think that it would be good to have an invert (or
> flipbits) function (which I would still argue should be handled by (~), but
> which could be a separate function).
>
> Some further thoughts....  I found it unexpected that the default element
> type is an Int64 (or Int32) represented as a single bit.  It seems that your
> target use case is in interaction with arrays, and not as a long bitstring,
> which is my use case.  I also don't really want an array of Bools (true and
> false), I really want an array of Bits (0s and 1s), even if they happen to
> have the same underlying type.
>
> Given all of this, here are some thoughts
>
> implement a Bit type, which is virutually identical to Bool, but which
> semantically allows me to treat a BitArray as an array of bits
> alternatively, referring back to Jeff's comments, extend BitArray (or
> implement separately) a PackedArray type, which would allow arrays of
> (currently non-existent) Uint4, Uint2, Uint1, etc..  Bit could be an alias
> for Uint1.
>
> (~) would also work as I expect for these cases.
>
> Any thoughts on these?

In some sense, Bool already is the type representing a Bit in Julia.
It "just happens" to also be used as a truth value when evaluating
expressions. If you don't care about algebra, you should definitely
use Bool as the type of choice in BitArrays, I don't see any real
problem in treating a BitArray{Bool} as an array of bits (corrrect me
if I'm wrong). Note that you can do things like b[5]=1 even if b is a
BitArray{Bool}, and that when you read out the value you can treat it
like it was 1 even though it prints as "true" (in fact, printing is
the only thing which may be a little annoying with Bools intended as
Bits).

The fact that Ints are the "default" bitarray type is somehow
questionable: they are only the default with bitones, bitzeros and
bitrand, but there are also bitfalses and bittrues which give you Bool
bitarrays. It seems to me that in this way the name of the function
reflects better the actual output (least surprising behaviour); I'd
still like to hear others opinions on that however.

If instead you care about algebra, you may a) use a "true" numeric
type and have it behave as expected, or even b) define you own numeric
type (for example, someone may want to work in modulo 2).

Finally, a flipbits functions can be implemented trivially, I'll
surely add it at some point.

Having a PackedArray type would be great and I was planning to try
implementing it (once bitarrays are settled), but it's non-trivial and
it could take a while to even have some basic functionality.

> In my case, I'm actually trying to represent genetic sequence as an array of
> bits.  The current implementation is in C++ and uses 2 bitsets to encode
> values.  I'd much rather have a packed array of Uint2 or Uint4.

Now that I think about it, it should be possible to define a type like

type GeneSequence
b1::BitVector
b2::BitVector
end

and use (b1[i],b2[i]) to represent a base in {'A','C','G','T'}. Then,
it would just be a matter of defining a few operators with some very
simple definitions. Its performance shouldn't be horribly worse than
that of contiguous packing of couples of bits: you have to operate on
two bitarrays every time, but the operations are simpler. This could
even be easily generalized to any number of bits.

If you're ok with it I could prototype this quickly, I guess.

Kevin Squire

unread,
May 4, 2012, 2:02:55 AM5/4/12
to juli...@googlegroups.com
Hi Carlo,


> implement a Bit type, which is virutually identical to Bool, but which
> semantically allows me to treat a BitArray as an array of bits
> alternatively, referring back to Jeff's comments, extend BitArray (or
> implement separately) a PackedArray type, which would allow arrays of
> (currently non-existent) Uint4, Uint2, Uint1, etc..  Bit could be an alias
> for Uint1.
>
> (~) would also work as I expect for these cases.
>
> Any thoughts on these?

In some sense, Bool already is the type representing a Bit in Julia.
It "just happens" to also be used as a truth value when evaluating
expressions. If you don't care about algebra, you should definitely
use Bool as the type of choice in BitArrays, I don't see any real
problem in treating a BitArray{Bool} as an array of bits (corrrect me
if I'm wrong). Note that you can do things like b[5]=1 even if b is a
BitArray{Bool}, and that when you read out the value you can treat it
like it was 1 even though it prints as "true" (in fact, printing is
the only thing which may be a little annoying with Bools intended as
Bits).

Yep, that's the only real reason I suggested the bit type: by default, a BitArray of "true" bits has a string representation of "trues" and "falses".   Yuck. :-)  (I already wrote a bitstring() method to complement your bitshow() method.)
 
The fact that Ints are the "default" bitarray type is somehow
questionable: they are only the default with bitones, bitzeros and
bitrand, but there are also bitfalses and bittrues which give you Bool
bitarrays. It seems to me that in this way the name of the function
reflects better the actual output (least surprising behaviour); I'd
still like to hear others opinions on that however.

Other than the "true"/"false" thing, I concur.  (It seems this thread isn't active enough right now to attract other responses...)

If instead you care about algebra, you may a) use a "true" numeric
type and have it behave as expected, or even b) define you own numeric
type (for example, someone may want to work in modulo 2).

Finally, a flipbits functions can be implemented trivially, I'll
surely add it at some point.

Having a PackedArray type would be great and I was planning to try
implementing it (once bitarrays are settled), but it's non-trivial and
it could take a while to even have some basic functionality.

Looking forward to it, and would be interested in helping/testing. 

> In my case, I'm actually trying to represent genetic sequence as an array of
> bits.  The current implementation is in C++ and uses 2 bitsets to encode
> values.  I'd much rather have a packed array of Uint2 or Uint4.

Now that I think about it, it should be possible to define a type like

type GeneSequence
     b1::BitVector
     b2::BitVector
end

and use (b1[i],b2[i]) to represent a base in {'A','C','G','T'}. Then,
it would just be a matter of defining a few operators with some very
simple definitions. Its performance shouldn't be horribly worse than
that of contiguous packing of couples of bits: you have to operate on
two bitarrays every time, but the operations are simpler. This could
even be easily generalized to any number of bits.  

If you're ok with it I could prototype this quickly, I guess.

That's pretty close to what I was doing in C++ already.  If it's not too much of a distraction and would be quick, sure, I'd love to see it.  (Though I'm actually more interested in an eventual PackedArray representation, which, as you pointed out, won't be quick.)
  
>> > I also need logical shift functions (>>>) and (<<).  If you (Carlo)
>> > haven't
>> > thought these through, I'll attempt an implementation and generate a
>> > pull
>> > request, if that's okay.
>>
>> That's a good idea, and I don't currently have plans for it (probably,
>> circular shift functions would be useful as well).

I'm done with these, will generate a pull request soon, although I'm traveling this weekend.

Cheers!

   Kevin

Diego Javier Zea

unread,
Dec 29, 2012, 11:53:58 AM12/29/12
to juli...@googlegroups.com
Hi Carlo & Kevin!
I'm interesting in know if there is some advance on the GeneSequence type or in PackedArray.
I like the Carlo's idea of


type GeneSequence
     b1::BitVector
     b2::BitVector
end

Something like a Matrix of BitArray [ BitArray{2} ] can be a good definition too for Unambiguous IUPAC Sequences and useful for using Neural Networks on Sequence Types [ were people used to represent an amino acids with 21 bits ].
For other uses, like manage of Ambiguous IUPAC Sequequences, Pfam Alignments, FASTAQ support... a 8 bits Sequence type can be better.
In the next days I'm going to work on 8 bits representation of Amino Acids and Nucleic Acids for Julia (and Sequences of this types).

I hope you are having a Nice Hollidays, and a happy New Year ;)


Diego

PS: I don't test already, but looks like slice columns in BitArray{2} can be a little slower [ 3e-5 s ] than in other matrices like [ in Array{Int64,2} takes 5e-6 s ].

Diego Javier Zea

unread,
Jan 8, 2013, 2:30:53 PM1/8/13
to juli...@googlegroups.com
I'm thinking on this GeneSequence and PackedArrays and I prototype this.
I'm think could be better in number of slicing operations than the previous version.
It's only a few-minutes prototype, but can work well with more job. What do you think?

type NucleotideSeq <: String
  data
::BitArray
  bits
::Int
end

import Base.length, Base.next
length
(s::NucleotideSeq) = length(s.data)/s.bits
next(s::NucleotideSeq, i::Int) = NucleotideSeq(s.data[i:(i+s.bits-1)],2),i+s.bits

a
=NucleotideSeq(convert(BitArray,[true; false; true; true; false; false; false; true]),2);

import Base.done, Base.isempty
done(s::NucleotideSeq,i) = (i > (length(s.data)-s.bits+1))
isempty
(s::NucleotideSeq) = done(s,start(s))
ref(s::NucleotideSeq, i::Int) = next(s,i)[1]
ref(s::NucleotideSeq, i::Integer) = NucleotideSeq(s.data[int(i)],2)
ref(s::NucleotideSeq, x::Real) = NucleotideSeq(s.data[to_index(x)],2)
ref{T<:Integer}(s::NucleotideSeq, r::Range1{T}) = NucleotideSeq(s.data[int(first(r)):int(last(r))],2)

import Base.show, Base.print, Base.write
print(io::IO, s::NucleotideSeq) = for c in s print(convert(Array{Int,1},c.data)) end
write
(io::IO, s::NucleotideSeq) = print(io, s)
show
(io::IO, s::NucleotideSeq) = print(io,s)




julia> a  # I'm thinking on a Dict for printing as Char
#[1, 0][1, 1][0, 0][0, 1]



Best
Reply all
Reply to author
Forward
0 new messages