Gee, you've answered your own question.
You say that you are "puzzled about the separate semaphore", and that
you "think semaphore is useful only for shared resources".
Two separate, simultaneous invocations of a single device driver will have a
lot of "shared resources" that should not be confused between invocations.
In your example, each invocation of driver kmalloc()s some memory, and, with
the semaphore to protect it, will kfree() the memory that it alone
kmalloc()ed.
Without the semaphore, each invocation would still kmalloc() memory, but
(because of race conditions on the memory holding the pointer to this
memory), would only free one block (multiple times). The other blocks would
be lost to the kernel memory, kmalloc()ed but never used or kfree()ed.
HTH
--
Lew Pitcher
"In Skills, We Trust"