I've just saw your reply, could you elaborate on how to check the size of the sharedStrings?
I've encountered another problem, I've tried the same code in windows and this is the output:
Traceback (most recent call last):
File "C:\Python27\APP-related-wikis-url_associationv2.py", line 135, in <module>
main("hpdepth0-8-1-2015.xlsx",
"homepages.txt",
"Wiki-RelatedIds.csv")
File "C:\Python27\APP-related-wikis-url_associationv2.py", line 92, in main
wb = load_workbook(fxlsx, read_only=True) #load xlsx file
File "C:\Python27\lib\openpyxl\reader\excel.py", line 191, in load_workbook
shared_strings = read_string_table(archive.read(strings_path))
File "C:\Python27\lib\openpyxl\reader\strings.py", line 16, in read_string_table
root = fromstring(text=xml_source)
File "lxml.etree.pyx", line 3092, in lxml.etree.fromstring (src\lxml\lxml.etree.c:70473)
File "parser.pxi", line 1828, in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:106307)
File "parser.pxi", line 1716, in lxml.etree._parseDoc (src\lxml\lxml.etree.c:105098)
File "parser.pxi", line 1086, in lxml.etree._BaseParser._parseDoc (src\lxml\lxml.etree.c:99780)
File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:94254)
File "parser.pxi", line 690, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:95690)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:94757)
XMLSyntaxError: internal error: Huge input lookup, line 2, column 363349999
EDIT: Hm, I've tested again with half the lines(~200.000) and it's working in windows too.