Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

in_array performance in unsorted vs sorted array

15 views
Skip to first unread message

William Gill

unread,
May 23, 2012, 2:20:19 AM5/23/12
to
I am reading transaction records from files. Each record has an
alphanumeric GUID but that record may be repeated in more than one file
(because of overlapping samples). I don't want to process duplicate
records, so I am considering a simple flat file to store the GUID's of
previously processed records.

To keep things simple I plan to use $done=file() to read the flat file,
and a simple if in_array to see if the current GUID has already been
processed, if not process the current record and add its GUID to $done.

Does anyone know if sorting an array has any significant impact on
in_array, or can I simply push push values into $done?

Also is there a better way than foreach() write() to get $done back into
the flat file?

Captain Paralytic

unread,
May 23, 2012, 5:36:52 AM5/23/12
to
Why not load the files into a database table which has a primary key
of the GUID, then you have one record for each GUID

Jerry Stuckle

unread,
May 23, 2012, 8:16:47 AM5/23/12
to
Arrays in PHP are associative; their keys are handled as hash values.
So I suspect it makes no difference on whether the array is sorted or not.

Also, what's wrong with foreach() write()? That's how you get arrays
back into a file. Ensure you lock the file so that you don't have two
scripts running against it at the same time.

But it sounds like you should be using a database. That would solve a
lot of your problems.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

William Gill

unread,
May 23, 2012, 10:57:17 AM5/23/12
to
On 5/23/2012 8:16 AM, Jerry Stuckle wrote:
>
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.
>
OK, That seems to be in sync with what I'm (not) finding in Google.

> Also, what's wrong with foreach() write()? That's how you get arrays
> back into a file. Ensure you lock the file so that you don't have two
> scripts running against it at the same time.
>
Nothing. I just wanted to be sure I wasn't unaware of something better,
like a function inverse to file(). Probably looking at less then 100k
records at any time so I guess performance won't be a problem.

> But it sounds like you should be using a database. That would solve a
> lot of your problems.
>
Yes it does, and that's where this is heading. Right now it is just
summarizing some information, but eventually this will become a
pre-processor for a db.

William Gill

unread,
May 23, 2012, 11:01:07 AM5/23/12
to
Pulled the trigger too soon. forgot to say Thanks.

Thomas Mlynarczyk

unread,
May 24, 2012, 4:20:57 PM5/24/12
to
Jerry Stuckle schrieb:

>> [Performance of in_array()]
> Arrays in PHP are associative; their keys are handled as hash values. So
> I suspect it makes no difference on whether the array is sorted or not.

The /keys/ are hash values, yes. But in_array() searches through the
/values/, not the keys, so I suppose (haven't tested it) the performance
is O(n) rather than O(1). On the other hand, I doubt if PHP keeps track
of whether the array is sorted or not, so it probably makes no
difference indeed, as you said.

Greetings,
Thomas

--
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)
0 new messages