That said, checking jobs is going to be expensive with your current
scheme, especially when you run into jobs with multiple skills, and
people with multiple skills. It can be done *much* faster with Redis.
That is, there is a simple way of calculating "get me all jobs that
user A can do" and "get me all users that can perform job Z". Think
it over for a while. If you need a hint or are pressed for time and
want the answer, ping this thread.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
I'm going to put a bunch of blank lines, then I'm going to provide the
answer (I'm going to be out much of this weekend, so I won't be able
to help otherwise). If you want to figure it out, use the hint. If
you want the answer, scroll down :)
...
...
The answer is as follows:
Rather than storing for each job the set of skills necessary to
perform the job, instead store ZSETs for each skill. Each skill zset
includes all of the jobs that require that skill, with their scores
all being 1 (if you are using Redis master for the 2.2 release, you
can use plain sets for this, as we are going to be using zinterstore,
which can take sets as an input, which default the scores to 0). You
also have a single zset with all of the jobs and the number of skills
that it requires. To be more clear, for joba that requires skill1 and
skill2, and jobb that requires skill2, skill3, and skill4:
ZADD skill1 1 joba
ZADD skill2 1 joba
ZADD jobs 2 joba
ZADD skill2 1 jobb
ZADD skill3 1 jobb
ZADD skill4 1 jobb
ZADD jobs 3 jobb
Say that we have someone with 3 skills: skill1, skill2, skill3. To
find out if there are any available jobs for them:
ZUNIONSTORE results 3 skill1 skill2 skill3 AGGREGATE SUM
results -> {joba: 2, jobb: 2}
ZINTERSTORE results 2 results jobs WEIGHTS -1 1 AGGREGATE SUM
results -> {joba: 0, jobb: 1}
Every job in the results ZSET with a score of 0 is able to be done
with the person you just queried for. You can trim off all jobs with
scores >= 1 and keep results for that person, or you can calculate it
on the fly whenever you need it. Either way is fine. Heck, you could
store it, then every day, calculate the difference and email out the
new jobs to the user.
You *can* perform the entire operation with a single ZUNIONSTORE
(weights for the skills are all -1, weight for jobs is 1) with the
same result semantics, but you always end up with a set as large as
your 'jobs' zset. It's mostly personal choice.
Regards,
- Josiah
- Josiah
Thank you, I'm glad to help. :)
> This solution performs well for testing and prototyping.
> With 10,000 jobs and 1,000 workers, ZUNIONSTORE/ZINTERSTORE
> can find jobs for one worker in about 50 ms.
>
> As the dataset grows, however, performance drops dramatically.
> With 100,000 jobs and 10,000 workers, it takes 2,100 ms to find
> jobs for one worker. Blocking other Redis calls for that long
> won't work.
>
> My use cases involve 30-40 skills per worker, and 2-10 skills
> per job.
If you are ok with the same or greater wall-clock time, without
blocking other operations, you can chunk your jobs into (perhaps)
groups of 1k-10k (as you tested), and do them chunk by chunk; that is,
check skill1:chunk1, skill2:chunk1, ... to look for a job. Then move
onto skill1:chunk2, skill2:chunk2, ..., unioning the result into the
result from the first chunk. Since the smaller unions/intersections
finish much faster, you won't be blocking everyone else, and you can
still end up with one list with everything.
> I'm guessing this solution uses a lot of cycles doing work
> that the problem doesn't require. For example, I only need
> to show the jobs that match a worker's skills. Since I don't
> need to compute or store the exact score for every combination,
> it would be more efficient to stop comparing a job as soon
> as one skill doesn't match.
Depending on the total number of skills you have, you could perform
differences instead of unions, and it might be faster...
Say that a person has skill1, skill3, skill4... but there are skills 1..5 .
SDIFFSTORE ok jobs skill2 skill5
Technically, anything that is in 'ok' is a job that the person can do,
as we've excluded those jobs that he can't do. Straight up set
arithmetic. Good eye about probably doing more work than was
necessary :)
In this updated version, "jobs" is the set of all jobs, with 'skillX'
being the set of all jobs that require skillX.
> I appreciate your help on this, I now have enough to complete
> the prototype. Though a more efficient approach will be
> needed if the app is successful.
>
> Thanks again for your analysis and recommendations, I've
> learned a lot!
You are welcome, it was a fun problem to solve :)
Regards,
- Josiah