I'm used to use delugesiphon Chrome plugin to add new torrents to server.
However after switching to Arch which has latest deluge, the plugin
doesn't work anymore for rutracker.org. After the investigation I've found
the issue to be deluge itself. httpdownloader. When requesting torrent
download rutracker responds with the header:
`Content-Type: application/x-bittorrent; charset=Windows-1251`
While providing charset for this content type IMO doesn't make sense, I
suggest to not do re-encoding to UTF-8 for anything besides 'text/...'
MIME types.
Attached is a suggested fix produced by
`diff /usr/lib/python3.8/site-packages/deluge/httpdownloader.py
/usr/lib/python3.8/site-packages/deluge/httpdownloader_fixed.py`
--
Ticket URL: <https://dev.deluge-torrent.org/ticket/3440>
Deluge <https://deluge-torrent.org/>
Deluge Project
* Attachment "charset_fix.diff" added.
Comment (by Cas):
We need more a bit information about the exact problem. Are the torrent
downloads in UTF8 and decoding with Windows-1251 is corrupting the data?
What is the error you are encountering?
I am wary of changing the way httpdownloader works as it could have
unintended consequences.
I would propose to not re-encode application/x-bittorrent (it should be
utf8...) so in request_callback don't set encoding if content-type is
application/x-bittorrent.
{{{
if "application/x-bittorrent" not in content_type:
encoding = charset
}}}
--
Ticket URL: <https://dev.deluge-torrent.org/ticket/3440#comment:1>
* milestone: needs verified => 2.0.4
--
Ticket URL: <https://dev.deluge-torrent.org/ticket/3440#comment:2>
Comment (by megaksa):
Correct. Example of torrent download headers:
{{{
Content-Type: application/x-bittorrent; charset=Windows-1251
Content-Disposition: attachment;
filename="[rutracker.org].t5778456.torrent";
filename*=UTF-8''%D0%91%D0%B8%D0%B1%D0%BB%D0%B8%D0%BE%D1%82%D0%B5%D0%BA%D0%B0%20%D0%9C%D1%83%D1%80%D0%B7%D0%B8%D0%BB%D0%BA%D0%B8%20-%20%D0%92%D0%B0%D1%80%D0%BC%D1%83%D0%B6%20%D0%92.%20-%20%D0%9C%D0%BE%D1%81%D1%82%D0%BE%D1%80%D0%B3%20%5B1930%2C%20PDF%2C%20RUS%5D%20%5Brutracker-5778456%5D.torrent
}}}
AFAIR charset treatment is generally defined for textual MIME types (RFC
6657), i.e. for those with the text/* MIME type. For the rest, the
treatment is per specific type documentation. For the binary types it may
indicate e.g. an internal encoding (like tags encoding inside an internal
binary file representation, particularly inside a torrent file, maybe
inside an mp3 file). So generally binary files cannot be re-encoded as
textual files can be. I think the right way is to not do the above
encoding unless the file is textual. So your proposed solution is less
correct, but would also work in my particular case. I'd go with reverse.
What are the known types besides text/* where you are interested with
content re-encoding?
--
Ticket URL: <https://dev.deluge-torrent.org/ticket/3440#comment:3>
* status: new => closed
* resolution: => Fixed
Comment:
Yeah I see what you mean and agree we should only be re-encoding text
content types. So I have modified your patch, added test and merged to
develop: [4d970754a4a]
Thanks for detailed reporting and suggested fix!
--
Ticket URL: <https://dev.deluge-torrent.org/ticket/3440#comment:4>