You are kind of on the right track.
1) First get intrinsic camera calibration. Either the camera provides it, or you calibrate yourself with a checker board.
2) Collect a rosbag of images with the realsense camera. I don't recall having dealt with color but it may just work.
3) Run sync_and_detect as you did. You need to adjust the ros topics to match what's in the bag. Don't collect compressed images unless you have to. Make sure the launch file for sync_and_detect has the compression setting right. Also make sure your tag family matches your tag, you are using the right tag detector (MIT can handle double-width tag borders, UMICH not but it's more sensitive), and you have the border width set right (1 vs 2) etc. You can also run the apriltag detector as a separate node and look at the debug images with rqt_image_view to see that the tag detector indeed decodes the tags.
Usually sync_and_detect is showing for every frame how many tags are detected. I see no output, so something is not right.