We need a more comprehensive rule of thumb that is not really linear in the number of parameters. But what we know at present is that it is safe to use a rule like the following. Suppose you have binary Y and you observe e events and m non-events. Then for a model to perform as well in the future as we think it does now when penalization (shrinkage) is not used,
min(e, m) = 96 + 15p
where p is the number of parameters in the model. 96 is the number of observations needed just to estimate the intercept with 0.95 confidence with a margin of error of +/- 0.1 on the absolute risk scale. Note that you do not get much of a benefit from having more subjects added to the more frequent outcome category, although they never hurt and sometimes allows you to estimate absolute risk and not just relative odds.
p includes one for each category of each categorical predictor other than the reference category. It also includes all nonlinear terms. It is important to note that p is pre-specified in ordinary estimation and equals the number of candidate predictors if doing variable selection. If using shrinkage, p is the number of effective degrees of freedom.
Consider all the biomarker and genomics work that is trying to create predictions and classifiers with small min(e,m) where the result is hopeless for even estimating crude marginal probabilities that ignore the biomarkers.
When n < 96 and Y is binary (i.e., is a minimum information outcome variable), as in your case, the exercise is nearly futile because if you ignored all the predictors and just wanted to estimate overall Prob(Y=1) you can't really do it. When n = 52 if 26 events were observed the 0.95 Wilson confidence interval for the probability that Y=0 is [0.37, 0.63] which means you don't know very much. This is why multi-center clinical studies are usually needed as opposed to getting data from only one center.