Small addendum: if instead of Q-learning you are using an on-policy algorithm such as (Expected) Sarsa, you additionally need a GLIE policy if you want the action-value estimates to converge to the optimal values. If epsilon does not go to zero, the action-value estimates will still converge (under some assumptions, e.g., on the step size), but then to the values of the actions under the (exploring) behaviour policy.
And, to elaborate on Michael’s answer: it is perfectly fine to use a constant epsilon for a long time with Q-learning, until the values converge, and then switch to no exploration at once (epsilon=0). This is what is sometimes done in practice when people use a constant epsilon: after “learning” the exploration is turned off for “testing”.