A more detailed description of your use-case my help in determining your actual needs. A CSV file is not apparently needed, as you have the initial text file, which served as the basis for generating the audio file by the API. You may consider using the
Speech to Text API to get some tags and metadata related to the audio file in question. If this is related to a programming project on your side, you may consider asking the same question in forums such as stackoverflow, where programmers are ready to help.