The tricky part is that XML has a lot more "features" than protocol buffers, and it's not clear how those should be represented. For example, how do you treat interleaved text? You could come up with all kinds of algorithms for mapping XML structured to protobufs, but I'm not sure there's any solution that would work well for everyone.
Given a mapping algorithm, it's not hard to write a program that uses protobuf reflection to map an arbitrary XML document a protocol message using that algorithm.