A weird unicode puzzle after upgrade from 2.0.20 to 2.5.4

70 views
Skip to first unread message

er...@tibco.com

unread,
May 13, 2016, 2:35:38 PM5/13/16
to reviewboard
After I migrated my server to 2.5.4, I'm seeing a weird error. I restarted both memcached and apache2, and then browse to a specific review request.

Then I click on the "Diff" tab. (After I turned on DEBUG = True in the settings_local.py file) I see this instead of diffs.

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/views.py", line 275, in get
    response = renderer.render_to_response(request)
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/renderers.py", line 56, in render_to_response
    return HttpResponse(self.render_to_string(request))
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/renderers.py", line 74, in render_to_string
    large_data=True)
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 295, in cache_memoize
    compress_large_data))
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 249, in cache_memoize_iter
    items = items_or_callable()
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 292, in <lambda>
    lambda: [lookup_callable()],
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/renderers.py", line 73, in <lambda>
    lambda: self.render_to_string_uncached(request),
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/renderers.py", line 87, in render_to_string_uncached
    request=request)
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/diffutils.py", line 429, in populate_diff_chunks
    chunks = list(generator.get_chunks())
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/chunk_generator.py", line 756, in get_chunks
    for chunk in super(DiffChunkGenerator, self).get_chunks(cache_key):
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/chunk_generator.py", line 107, in get_chunks
    large_data=True)
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 295, in cache_memoize
    compress_large_data))
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 249, in cache_memoize_iter
    items = items_or_callable()
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/cache/backend.py", line 292, in <lambda>
    lambda: [lookup_callable()],
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/chunk_generator.py", line 106, in <lambda>
    lambda: list(self.get_chunks_uncached()),
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/chunk_generator.py", line 763, in get_chunks_uncached
    new = get_patched_file(old, self.filediff, self.request)
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/diffutils.py", line 230, in get_patched_file
    diff = tool.normalize_patch(filediff.diff, filediff.source_file,
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/models.py", line 218, in _get_diff
    self._migrate_diff_data()
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/models.py", line 421, in _migrate_diff_data
    diff_hash_is_new = self._set_diff(self.legacy_diff_hash.binary)
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/models.py", line 225, in _set_diff
    RawFileDiffData.objects.get_or_create_from_data(diff)
  File "/usr/lib64/python2.7/site-packages/ReviewBoard-2.5.4-py2.7.egg/reviewboard/diffviewer/managers.py", line 345, in get_or_create_from_data
    'compression': compression,
  File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create
    return self.get_queryset().get_or_create(**kwargs)
  File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 383, in get_or_create
    obj.save(force_insert=True, using=self.db)
  File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 654, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 687, in _do_insert
    using=using, raw=raw)
  File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/usr/lib64/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "/usr/lib64/python2.7/site-packages/Djblets-0.9.3-py2.7.egg/djblets/log/middleware.py", line 32, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 124, in execute
    return self.cursor.execute(query, args)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 207, in execute
    if not self._defer_warnings: self._warning_check()
  File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 117, in _warning_check
    warn(w[-1], self.Warning, 3)
Warning: Invalid utf8 character string: '890600'

   
Note that if I set DEBUG = False, then I see the diffs in the browser, but I still see warnings in the log file. I'm nervous that migrating my production system forward to 2.5.4 will corrupt data. So now I'm holding off.

Looking back through the stack trace, I see there's some logic in there about "_migrate_diff_data" - code which didn't exist in 2.0.X.

I suspect this is a manifestation of this bug:

What can I do?

Eric.
    

Christian Hammond

unread,
May 13, 2016, 4:49:53 PM5/13/16
to revie...@googlegroups.com
Hi Eric,

Hmm, we'll need to look into that. Is there a way you'd be able to send us the diff for that? (I can help you find it.) We will need a copy in order to diagnose this. We can sign an NDA for it.

Christian
--
Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/
Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/
Happy user? Let us know! https://www.reviewboard.org/users/
---
You received this message because you are subscribed to the Google Groups "reviewboard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
-- 
Christian Hammond
President/CEO of Beanbag
Makers of Review Board

er...@tibco.com

unread,
May 14, 2016, 4:40:06 PM5/14/16
to reviewboard, chri...@beanbaginc.com
I re-ran the migration, and before doing anything, set the DEBUG flag to True before fetching any diffs.

I can confirm that it fails on *every* diff, not just random ones here and there.

Running with DEBUG = False, the warning doesn't throw an exception, but I do see an entry in the log:
WARNING:py.warnings:/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py:124: Warning: Invalid utf8 character string: '81FE48'

After I've viewed the diffs of a review request once, the warnings stop appearing (consistent with the data being migrated to compressed form.). Switching back to DEBUG = True, and I no longer see failures for the review request diffs that I looked at while DEBUG = False

Trying a different approach, I ran "rb-site manage ______ -- condensediffs", and that also generated the same warnings. Here's a sample:

WARNING:py.warnings:/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py:124: Warning: Invalid utf8 character string: 'D76700'

  return self.cursor.execute(query, args)


WARNING:py.warnings:/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py:124: Warning: Invalid utf8 character string: 'A9C813'

  return self.cursor.execute(query, args)


...


This generated 169000+ lines of output, corresponding to 56435 individual warning messages out of a total of 76032 diff files condensed.


This gave me an idea.


I ran "rb-site ... condensediffs" before the upgrade, then once again after the upgrade.


Problem went away.


Weird. Do you want me to try to find out more, and if so, what?


Eric.



To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

er...@tibco.com

unread,
May 15, 2016, 4:11:30 PM5/15/16
to reviewboard, chri...@beanbaginc.com
I have finally re-run my entire automated migration process from start to finish.

After running condensediffs before the upgrade, I was then able to run condensediffs after the upgrade as well, all without any warnings or errors.

Perhaps you should add a step to the upgrade guide, advising Administrators to rebuild their search index, and also condense diffs?

So I seem to have found a work-around to the problem - in case anyone else runs into it.

Eric.

Christian Hammond

unread,
May 15, 2016, 6:37:17 PM5/15/16
to revie...@googlegroups.com
Hi Eric,

I'd still really love to get a copy of those entries from your database so I can replicate the problem. I simply can't make it happen here, and have no idea why you'd be seeing what you're seeing. It should not be necessary to run condensediffs prior to upgrading.

Christian

-- 
Christian Hammond
President/CEO of Beanbag
Makers of Review Board

Eric.
To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
-- 
Christian Hammond
President/CEO of Beanbag
Makers of Review Board

--
Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/
Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/
Happy user? Let us know! https://www.reviewboard.org/users/
---
You received this message because you are subscribed to the Google Groups "reviewboard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.

Koushik Roy

unread,
Dec 19, 2016, 6:23:30 PM12/19/16
to reviewboard, chri...@beanbaginc.com

Christian,
  I too, have faced this problem while upgrading from 2.0.21 to 2.5.7
  The 'invalid utf8 character' may be only a symptom of the problem.
  My only guess is around the following -
    - the database was using 'latin1' as default charset, RB 2.5.x expects 'utf8'. So some mismatch/warning results.
    - the database was using 'MyISAM' with which the evolution fails; one has to change the engine to 'InnoDB' before evolution.
  If you want a sample diff, I can possibly produce it - let me know how, please give me step by step instructions.

  Bottomline, this upgrade has not yet been successful for me.
  Any help is appreciated.
  Thanks

Koushik Roy.

Christian Hammond

unread,
Dec 19, 2016, 7:35:44 PM12/19/16
to revie...@googlegroups.com
Hi,

Sorry you're hitting this too. To confirm, did the actual database upgrade itself complete without errors, once the table type and such were adjusted?

Can you show me the full error information?

Thanks,

Christian


Koushik Roy

unread,
Dec 20, 2016, 1:51:42 PM12/20/16
to reviewboard, chri...@beanbaginc.com

Yes, the database upgrade was complete without errors, after changing the tables to InnoDB.
I will soon give you the full information. My db size is too large (15GB).
I am trying to create a smaller test DB to reproduce the problem.

Thanks,

Koushik Roy

unread,
Dec 21, 2016, 11:41:49 AM12/21/16
to reviewboard, chri...@beanbaginc.com
Christian,
  As promised, more information in the file attached.
  Please HELP, HELP!!

  With the utf8 conversion warning, MY CONCERN IS DATA LOSS/CORRUPTION.
  If the text contains only plain English, then may be we are fine.
  But where multi-byte encoding is required, is there any guarantee that
  the warning did not cause data loss/corruption?

  Also, I don't understand if the encoding warning has anything to do
  with the 'condensediffs' as Eric had mentioned.
  Performing 'condensediffs' may have only bypassed the warning?

  Let me know if you need any more information.

Thank you.

Koushik Roy.


capture

Christian Hammond

unread,
Dec 21, 2016, 7:19:35 PM12/21/16
to revie...@googlegroups.com
Hey,

So the good news is, this is not a corruption issue, nor is it related to the contents of any of your files.

This error is occurring when saving diff data using our new compressed storage mechanism. We populate a binary field with compressed diff data when uploading a new diff, viewing an existing diff, or condensing all diffs.

What's happening is that MySQL is validating the compressed data as Unicode. It's then seeing something it doesn't like in there and complaining about it, but still permitting it to be stored. The storage is therefore working as expected (since you can then successfully view the diff), just the validation is wrong. This happens regardless of the contents of your diffs.

Newer versions of Django work around this, but we don't use those versions. I'll see if there's a workaround we can put in place.

If you keep DEBUG set to False (which you should for production), then the warnings shouldn't crash the page. If need be, you can instruct MySQL to ignore warning code 1300.

What version of MySQL are you using?

Christian


Koushik Roy

unread,
Dec 21, 2016, 8:20:22 PM12/21/16
to revie...@googlegroups.com

MySQL 5.5

Thanks for the explanation
So it is not required to condense diff before db migration, right?

In fact, condense diff happens on the fly. Is there any advantage to do condense diff from time to time?



Christian


To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard+unsubscribe@googlegroups.com.



For more options, visit https://groups.google.com/d/optout.


--
Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/
Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/
Happy user? Let us know! https://www.reviewboard.org/users/
---
You received this message because you are subscribed to a topic in the Google Groups "reviewboard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/reviewboard/sERwJNqPy60/unsubscribe.
To unsubscribe from this group and all its topics, send an email to reviewboard+unsubscribe@googlegroups.com.

Christian Hammond

unread,
Dec 21, 2016, 11:59:34 PM12/21/16
to revie...@googlegroups.com
condemsediffs only needs to be done once. It's optional, but can reduce database size significantly. You don't need to run a periodic condense operation. We also condense individual diffs when manually viewing them as well.

I'll try to find out more info about which versions of MySQL are impacted by this warning bug.

Christian



Christian


To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.



For more options, visit https://groups.google.com/d/optout.










--


Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/


Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/


Happy user? Let us know! https://www.reviewboard.org/users/


---


You received this message because you are subscribed to a topic in the Google Groups "reviewboard" group.


To unsubscribe from this topic, visit https://groups.google.com/d/topic/reviewboard/sERwJNqPy60/unsubscribe.


To unsubscribe from this group and all its topics, send an email to reviewboard...@googlegroups.com.



For more options, visit https://groups.google.com/d/optout.










--


Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/


Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/


Happy user? Let us know! https://www.reviewboard.org/users/


---


You received this message because you are subscribed to the Google Groups "reviewboard" group.


To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.

Koushik Roy

unread,
Dec 22, 2016, 1:40:34 PM12/22/16
to reviewboard, chri...@beanbaginc.com
Christian,
  There is still one problem, though.
  For a development server where an exception is raised due to this utf8 warning, creating new review request fails.

rbt post --server=http://localhost:8080
Generating diff for pending changeset default

==> HTTP Authentication Required
Enter authorization information for "Web API" at localhost:8080
Username: admin
Password: 
ERROR: Error uploading diff


One or more fields had errors (HTTP 400, API Error 105)

    path: Invalid utf8 character string: 'C96A49'

Your review request still exists, but the diff is not attached.


  Any advice?

Thanks,

Koushik

Christian Hammond

unread,
Dec 23, 2016, 1:37:04 AM12/23/16
to revie...@googlegroups.com
What's the purpose of your dev server? Depending on needs, SQLite might be a better option, but probably not if this is a staging server.

You might want to try a newer version of MySQL. I might be wrong, but I don't believe I've hit this issue with 5.6. When in DEBUG = True mode, warnings will propagate as errors. You may have to configure MySQL to ignore the error code.

Christian


Koushik Roy

unread,
Dec 23, 2016, 2:08:23 PM12/23/16
to reviewboard, chri...@beanbaginc.com
My devserver is indeed a staging server.
Also, I have written a couple of extensions for our specific workflow in the organization. For that too I need a devserver.

I validated that MySQL 5.6 does not produce this warning.
I will upgrade to MySQL 5.6 soon.

Thanks

Koushik
Reply all
Reply to author
Forward
0 new messages