Hello Achim,
Thank you for your question about converting UCSC gene IDs into gene names. I agree, it does not sound like using the "identifiers" is quite appropriate for your needs. You can add the UCSC ID number to the results of your query by also checking the field "kgID" in addition to "gene symbol" and "gene description", which may help with your troubleshooting, but I am also quite confused why you received more than 1000 extra lines. You can send the data files to me privately if you would like some help looking into it.
One of our engineers notes that you should also be able to obtain results from our public MySQL server (http://genome.ucsc.edu/goldenPath/help/mysql.html) with the following UNIX command:
mysql -A -u genome -h genome-mysql.soe.ucsc.edu 'select name,geneSymbol from knownGene,kgXref where kgID = name order by name' > ucscGeneSymbols.txt
grep -Fwf myUcscGenes.txt ucscGeneSymbols.txt
If you do not have access to a UNIX command line, you may instead be interested in using the tools at Galaxy (https://usegalaxy.org). Galaxy allows you to take the text from queries to the UCSC Table Browser and interact with it directly. You would start by loading a UCSC Table Browser query with the kgId and geneSymbol columns from the kgXref table, and then use Galaxy's "Join two Datasets" tool (part of the "Join, Subtract, and Group" tool set) to collect only lines that match your list of UCSC ID numbers. Your ID list would be the first data set, and the results of the UCSC Table Browser query would be the second data set. If you would like to follow this path, please note that you should not specify your identifiers in the UCSC Table Browser. The Galaxy Join tool will take care of imposing that limit.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead togenom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--