"soup" is a BeautifulSoup object (defined at line 316)
_index_document goes through the document, recursively visiting each tag. The relevant "_enter" function is called for each tag (note that _enter is a dictionary that maps tag names to *functions*). I don't think you need to modify this function in any way.
The _visit_a function is called whenever the crawler runs across an <a> tag, which is almost invariably a link to another webpage.
I feel like I'm missing your exact question though. Tell me if that's the case.