Hi all,
New Scala user here, coming from years of Java. Great language so
far. However, I can't seem to find a pattern to read a file into a
byte array in a "Scala" way that doesn't take for freaking ever to
complete, computationally speaking.
Certain pdf libraries (iText) require reading the entire file in from a byte array to initialize the data structures. So, we need to be able to read the whole file and pass it to the iText functions. We also have proprietary files that are not over 10MB currently,
but could potentially grow in the future that we need to work with atomically.
So the java nio is probably the way to go? From what I understand of Java streams, they use the relevant parts of nio under the covers especially in later versions (6 and 7)
What I meant by "atomically" was as a whole unit, not pieces thereof, so reading only parts of the file would not work. It would need to be the full file contents in memory at the very least.
-------- Original-Nachricht --------
> Datum: Mon, 19 Mar 2012 16:25:06 +0100
> Von: "√iktor Ҡlang" <viktor...@gmail.com>
> An: Scott Spillmann <scott.s...@gmail.com>
> CC: scala-user <scala...@googlegroups.com>
> Betreff: Re: [scala-user] Better way to read binary file...
> >>> Typesafe <http://www.typesafe.com/> - The software stack for
> >>> applications that scale
> >>>
> >>> Twitter: @viktorklang
> >>>
> >>>
> >>
> >>
> >> --
> >> Viktor Klang
> >>
> >> Akka Tech Lead
> >> Typesafe <http://www.typesafe.com/> - The software stack for
> >> applications that scale
> >>
> >> Twitter: @viktorklang
> >>
> >>
> >
> >
> > --
> > Viktor Klang
> >
> > Akka Tech Lead
> > Typesafe <http://www.typesafe.com/> - The software stack for
> applications
> > that scale
> >
> > Twitter: @viktorklang
> >
> >
>
>
> --
> Viktor Klang
>
> Akka Tech Lead
> Typesafe <http://www.typesafe.com/> - The software stack for applications
> that scale
>
> Twitter: @viktorklang
They are binary files. Source.fromFile read text files.
--
Daniel C. Sobral
I travel to the future all the time.
Have you tried this?
val in4 = new java.io.BufferedInputStream(this.getClass().getClassLoader().getResourceAsStream(res_name))
> var stream = Iterator continually in4.read takeWhile (-1 !=) map
> (_.toByte) toArray
Yeah, reading one byte at a time just to then convert into a single
array is not going to be efficient. In particular because you don't
know beforehand what that array size will be, since you are working
with an Iterator. That means the array will be resized, which means it
will be copied multiple times as it grows.
There are many ways in which you could optimize this, but, in the end,
it comes down to whether it matters or not. If speed is not crucial,
then you might keep this functional version. If it is of utmost
importance, then you should go with the fastest version.
> }
>
> and took well over 1000 ms to complete due to iterating over the
> stream multiple times. Is there any way that I can buffer the "Scala"
> method so that it rivals the java streams way in terms of
> performance? If there isn't a way, I'm just as happy using the java
> wrapper way.
>
> Thanks in advance
--
>> Typesafe <http://www.typesafe.com/>- The software stack for
>> applications that scale
>>
>> Twitter: @viktorklang
>>
>
--
Tony Morris
http://tmorris.net/
You could use an iteratee, which will scale to any size file and perform
much better, if you were willing to forego the iText library and
implement something useful yourself.
I've done it recently. It's relatively easy with appropriate general
library support -- much easier than using iText, which I last had the
displeasure of doing about 3 or 4 years ago.
On 20/03/12 03:03, √iktor Ҡlang wrote:
> 2012/3/19 Tony Morris <tonym...@gmail.com>
>
>> I've done it recently. It's relatively easy with appropriate general
>> library support -- much easier than using iText, which I last had the
>> displeasure of doing about 3 or 4 years ago.
>>
> Yeah? Would love to see the sauce. I ended up switching from iText to a
> proprietary 3rd party lib because it had support for AcroForms.
The company I work for gets cold feet over open-sourcing our work. I've
spent the last few months trying to get a very basic library to OSS, not
that it is too useful, but to get over this artificial barrier. Once I
do that, maybe I can look at more...
--
Tony Morris
http://tmorris.net/
The company I work for gets cold feet over open-sourcing our work. I've
spent the last few months trying to get a very basic library to OSS, not
that it is too useful, but to get over this artificial barrier. Once I
do that, maybe I can look at more...
--
Tony Morris
http://tmorris.net/
What??? I thought blocking IO is the reason we have threads in the first place… J
How’s this for a thought: if EVERYTHING was asynchronous – would we need threads?
What??? I thought blocking IO is the reason we have threads in the first place… J
How’s this for a thought: if EVERYTHING was asynchronous – would we need threads?
>> >> <mailto:scott.s...@gmail.com
Of course, a complete rewrite is mostly out of scope. But even a
library which is just able to read the basic PDF file structure (which
is kind of a filesystem in itself) would already be immediately
useful.
--
Johannes
-----------------------------------------------
Johannes Rudolph
http://virtual-void.net
Well – because you want to occupy the CPU with something else while the user is waiting for the… mouse to move or the keys to be accumulated. At the same time you want to simplify programming, so it all looks synchronous and stupid, hence you invent threads, which are nothing but mini-processes or suspend-able state of a path carved through code, all the while the I/O in fact works via hardware interrupts – not really blocking J
I remember Windows 3 was process-based cooperative multitasking and it worked relatively fine if you remembered to call yield() from all events J Unix-style preemptiveness was black magic woodoo stuff J … until the advent of the i386 I think, which had dedicated TAS and task switching instructions and made it simple?
IF you didn’t have to deal with blocking I/O, i.e. waiting for stuff – if ALL programming was reactive, workflow or actor-based, would we need threads? If everything, down to the AX+DX (or whatever the registries are called) was asynchronous units of work – would we need threads? If all programming was in the form of “WHEN this DO that”, what is a thread?
All a thread is, is a suspendable state of execution as seen by a dumb lower level OS that doesn’t know what the heck is going on upstairs. But if the upstairs was all reactive… what then?
Yeah – time for the second coffee of the day.
Well – because you want to occupy the CPU with something else while the user is waiting for the… mouse to move or the keys to be accumulated. At the same time you want to simplify programming, so it all looks synchronous and stupid, hence you invent threads, which are nothing but mini-processes or suspend-able state of a path carved through code, all the while the I/O in fact works via hardware interrupts – not really blocking J
I remember Windows 3 was process-based cooperative multitasking and it worked relatively fine if you remembered to call yield() from all events J Unix-style preemptiveness was black magic woodoo stuff J … until the advent of the i386 I think, which had dedicated TAS and task switching instructions and made it simple?
IF you didn’t have to deal with blocking I/O, i.e. waiting for stuff – if ALL programming was reactive, workflow or actor-based, would we need threads?
If everything, down to the AX+DX (or whatever the registries are called) was asynchronous units of work – would we need threads? If all programming was in the form of “WHEN this DO that”, what is a thread?
All a thread is, is a suspendable state of execution as seen by a dumb lower level OS that doesn’t know what the heck is going on upstairs. But if the upstairs was all reactive… what then?
Well – because you want to occupy the CPU with something else while the user is waiting for the… mouse to move or the keys to be accumulated. At the same time you want to simplify programming, so it all looks synchronous and stupid, hence you invent threads, which are nothing but mini-processes or suspend-able state of a path carved through code, all the while the I/O in fact works via hardware interrupts – not really blocking J