Yeah, I think it shouldn't come up. But I'm not sure I fully understand
Vasili's concern . Maybe if it was more specific with more details, I could
better understand it.
Django's documentation states:
https://docs.djangoproject.com/en/dev/ref/unicode/#creating-the-database> Make sure your database is configured to be able to store arbitrary string
> data. Normally, this means giving it an encoding of UTF-8 or UTF-16. If you
> use a more restrictive encoding – for example, latin1 (iso8859-1) – you won’t
> be able to store certain characters in the database, and information will be
> lost.
>
> ...
>
> All of Django’s database backends automatically convert strings into the
> appropriate encoding for talking to the database. They also automatically
> convert strings retrieved from the database into strings. You don’t even need
> to tell Django what encoding your database uses: that is handled
> transparently.
So, if these non-UTF-8 articles are stored in the database, this doesn't
involve FILE_CHARSET. Are the articles stored as text or binary data? If text,
this violates existing Django documentation & assumptions. The database is
expected to be configured for UTF-8. If binary data, then the project's code
will be responsible for decoding it to a text string.
If, on the other hand, these articles are stored as files, how are they being
loaded? If they are being loaded through a Django code path, which one such
that FILE_CHARSET is involved? Or, are these articles loaded by project code
such that the encoding can be specified.
So, IIUC, it doesn't seem like FILE_CHARSET should be involved for this use
case.