Convolution is a contextual operation and as such is only useful if order is important in your data. If it's 512 sensors of various kinds, you probably don't need convolution. But if it's a audio signal represented by 512 samples - you'll find it useful.
As to fully connected layers, choosing their sizes is a tradeoff between underfitting and overfitting: you don't want your network to "memorize" data, but generalize it. More neurons = better capacity, greater chance of overfitting. Less neurons = less overfitting, possibly worse classification (insufficient capacity to accurately represent data). There's no golden rule though.