Hi Lei,
As long as you are sharing the StreamFactory instance across threads, I wouldn't worry too much about creating a new Unmarshaller instance for each record. The code is optimized to minimize object instantiation when a new Un/marshallers is created. In my limited benchmarking, I saw about 500ms of overhead for creating 1 million unmarshallers. If that is too high for your needs, you could explore using ThreadLocal variables (which have their own performance drawbacks) or adding an Unmarshaller to the class attributes of your Thread.
An un/marshaller will never be thread-safe because they are stateful for enforcing record grouping and ordering rules. You could synchronize all access to one, but that would probably defeat the purpose of multi-threading.
Thanks,
Kevin