On Thu, Dec 10, 2009 at 7:59 AM, Phlip <
phli...@gmail.com> wrote:
> On Dec 9, 3:10 pm, Russell Keith-Magee <
freakboy3...@gmail.com> wrote:
>
>> Ok; using some non-pk value for PK references is certainly one way to
>> handle this. There is an issue around how to resolve a hash into an
>> actual pk value, but that shouldn't be impossible.
>
> In Rails, a YAML (JSON) fixture like this...
>
> norbert:
> name: the nark
>
> ...expands into this...
>
> norbert:
> id: <%= hash('norbert') %>
> name: the nark
>
> ...hence this:
>
> norbert:
> id: 39779393
> name: the nark
>
> Then the id stamps into the database as that record's pk.
Holy mother of what? Randomly inventing a PK and hoping you don't have
a collision? Ah, no. No. Not ever.
>> The big issue is how to format a hash so that it can be differentiated
>> from a primary key value. To shortcut a couple of obvious (but wrong)
>> solutions:
>> * "just use hash values all the time" isn't an acceptable answer,
>> because we have backwards compatibility to consider
>> * "if it's an int, use pk, if it's a string, use hash" doesn't work,
>> because Django allows strings as primary keys. It isn't common, but it
>> is possible.
>
> That's why I suggested adding the templating layer to the JSONs.
>
> In general, Django "encourages" screwing with the Admin, then
> extruding sample records, while RoR "encourages" writing very terse,
> very templated YAML files as test code source.
What rubbish. Django encourages nothing of the sort. Django provides
an admin tool. It's a useful tool, especially as - surprise surprise -
an administration interface. You *can* use it to generate data for
fixtures if you want to. I challenge you to find anywhere in the docs
that say the admin interface is *the* way to generate fixtures.
Personally, I never use the admin to build my test fixtures - I hand
write them. Django's fixture format is simple (at least, in JSON and
YAML it is - XML is verbose, but that's XML for you). When you're in
complete control of all the data (as you should be during testing),
hard coding primary keys isn't a problem.
On the subject of which - Why on earth do you even *need* pk-less
objects in a test fixture? If you're in control of all the data - as
you should be during testing - the original problem you describe of
avoiding PK collisions at run time doesn't really exist.
I accept that this problem exists for loading new data into the
database - i.e., "create 10 new people - here are their names and
addresses, but I don't have pks for them " - but for testing? Not as
far as I can see.
>> > The problem now, at loaddata time in production, is the hashes still
>> > might (one in a million chance) collide with a preexisting PK. And the
>> > next problem is the hashes will bump their PK incrementors way up,
>> > throwing away whole ranges of valid fictitious IDs, when the next
>> > natural record inserts.
>
>> Hash collisions aren't a huge concern to me. As long as whatever you
>> are hashing has sufficient entropy that collisions on *input* to the
>> hash aren't possible (or especially likely).
>
> But abandoning all those fictitious numbers, say between our highest
> record of 204 and our hash of 39779393. The auto-incrementor will use
> 39779394 next, and so on. Then all of those numbers between 204 and
> 39779393 will feel bad, because they never got to index a record.
We're on a completely different page here. I have no problem with
using a hash as an fixture-internal reference to an object until such
time as the object is assigned a real pk by the databse. Using a hash
of content as an actual primary key values is a completely different
matter (and, to reinforce my previous point - no, no, not ever).
Yours,
Russ Magee %-)