The use case is that we have sensitive data that our SMEs are looking through to establish how much sensitive information is actually in the data that we are receiving, so the EC2 is secure with restricted access. At the moment, we have SMEs RDP into the instance (only one at a time can review data).
What is happening is when the SMEs are looking through the data, Prodigy will throw an "Oops, something went wrong :(" error. The way we've been getting around this is killing the process for Prodigy and restarting it. I've been doing this through SSM in the EC2 console. The problem is, we get this problem every 15 minutes or so and it's, understandably, very disruptive when the data is being reviewed.
I tried debugging it here, injecting an artificial error in the code, to force it to throw an Internal Server Error, but when triggering it I'm still getting the full traceback leading to the point where I injected the error. It seems the fact that you are not getting any extra info about the error, any traceback, would be the first to debug.
wouldn't show up if you were running locally. So I suspect it could actually come from the load balancer and not Prodigy (even if the error was in the code related to Prodigy), and that might be hiding the actual error underneath.
Another option to try is to test with a small FastAPI app instead of Prodigy in that container and see if you can inject an error and get a traceback, that would help finding out where the error can be, if it's related to configs in Prodigy, the environment, etc.
I don't understand how a load balancer makes a difference. Eventually the request would hit the Prodigy code or my custom recipe code and that's where the exception occurs and should be logged. Or am I missing something?
Yes, but if the INFO log is not shown locally, then it would probably mean that it's coming from somewhere else, not from Prodigy, possibly the load balancer. It could mean that, for example, you are getting the logs of the load balancer but not the logs of the container behind it. Or maybe the container with Prodigy could be configured with an internal load balancer, e.g. an Nginx with a proxy pass to Prodigy, on the same container.
Something else you could try is adding something like Sentry as part of your recipe to log the error with the traceback to a remote server, that could work and help you figure out what's happening independent of how the logs are being handled in your cluster.
Prodigy has its own logger, and all other log statements are produced by uvicorn. I think uvicorn also outputs log statements from other applications if they just log via logging.info. So that's whar could be happening here? But yeah, Sentry is probably the most systematic approach here.
I'm now able to reproduce this locally when running Prodigy in a Docker container. @tiangolo you were right that the log entry stating the Internal Server Error is indeed coming from the nginx Ingress in Kubernetes, as this does not appear when running locally.
The frontend still receives an Internal Server Error and the logs are not showing anything. Only when manually exiting out of Prodigy (using Ctrl-C in terminal) the traceback appears. I also tried Sentry but it doesn't log anything at all.
I suspect this has something to do with the exception not bubbling up from an async worker until the worker is exited forcefully. Sometimes that can happen when the exception can't be pickled. Just guessing here. Weird though that this only happens in Docker (Linux) and not when running on macOS.
Yes, I think that's what's going on.
However the exception doesn't even surface as an INFO log message. That INFO message seen above comes from a load balancer, not anything in Prodigy. I could verify that by running Prodigy in Docker without a load balancer and nothing showed up in the logs. Rather the exception only surfaces once Prodigy is exited.
The only patterns I can see are that the errors seems to occur more often when they are annotating at the same time, but we have also seen this error when only one annotator is annotating. This will sometimes result in:
I updated instant_submit today from false to true in order to auto-save in the hopes that we would not lose any additional documents the annotators had annotated. We are saving the annotations in a CloudSQL database, which the prodigy web servers connect to via a proxy.
While one annotator has some connection issues (slow connection), the other does not, and they are both experiencing this. Restarting our server, logging in and out, and closing and reopening their browser doesn't seem to improve the situation. The annotators annotate quickly, and I'm wondering if this is causing the issue as I saw that issue in another forum post. However, annotating quickly shouldn't result in an error. Annotators are annotating the same dataset in order to assess inter-annotator agreement. Right now, the current set is 5k documents (documents = short sentences), but this also occurred with a much smaller dataset of 500.
The prodigy.json file above shows my settings. I'm wondering if there is a setting that I need to change to work alongside instant_submit? I added instant_submit in order to auto-save the documents. The dataset is shared between two annotators, and they are annotating the same documents (inter-annotator agreement), but only one annotator is working on the set right now.
One more issue to report: our server was restarted, and when I restarted the tasks for each annotator, I am seeing the document where the saved annotations error showed up last time, but many documents have been annotated since that one was in the queue. I don't see anything odd about this particular line in the .jsonl file, though.
I checked your settings again but they don't seem strange. The auto_exclude_current setting is set to true, which is the default value just like auto_count_stream, so that seems fine. A batch_size of 1 certainly isn't unheard of. The feed_overlap setting also seems fine, although this depends more on how you'd like your colleagues to label. It shouldn't cause an error.
I'll ping some colleagues to check if they can spot something, but in the meantime, I am curious about the logs from CloudSQL or the proxy that you use. Can you see anything on the Google Cloud side that might indicate a connection/authentication mishap?
We assume the "Client" is the prodigy web server, which presumably uses a pool of connections that are periodically refreshed? Our own code does nothing more than provide the proxy's host name and port number to the prodigy servers.
I'm thinking out loud here, but I recall from previous work with Postgress that a maximum number of connections can cause trouble. I think the default setting is 100 connections, which I think is also what Google assumes. The Google Cloud Docs also reflect this quota here and here.
If you run prodigy.serve() for each permutation of input file/name/recipe, is it likely that you hit this limit? Does your proxy close down connections to the database after a "task" is considered completed? I'm not 100% sure if this is the issue, but it seems good to check and rule out.
One of my colleagues suggested you may want to try out our new alpha release. Details are listed here. It feels like it might be worth a try given what you're experiencing. One of the key differences is that we're switching from peewee to SQLAlchemy as our ORM, which might address some of the database issues you're experiencing as a result. Do let me know if this does/doesn't help!
So it turns out that our annotators may have been using Edge as their browser, but not consistently. So that is a possible issue. I have them exclusively using Chrome, and will check to see if we have any additional issues.
I installed prodigy and it appears on my package when i run pip list, and i followed the instructions on -first-steps. I then ran the anaconda prompt:
-m prodigy dataset new_dataset
-m prodigy ner.teach new_dataset en_core_web_sm news_headlines.jsonl
with sample data but the interface keeps showing me this error message, Oops! Something went wrong! Anyone know what the problem could be?? Thanks in advance!!
If anyone comes looking for the answer to this, I ran into the same error and it had to do with the model itself. I had attempted to remove/replace the NER pipe and export the model to a new one and something must have gone sideways with the process.
In the beginning, around the 1:55 mark in the video, Ines runs the command prodigy dataset insult_seeds "collect seed terms for insult classifier" when i try to run the command as !python -m prodigy dataset insult_seeds "Collect seed terms for insult classifier", I get the error:
I'm not sure why it's not recognizing the command as I don't believe the 'dataset' argument has changed for the current version of prodigy. I am also using windows, hence my usage of !python -m before each command.
Hi Ines! I was able to do as you said and just run the next command and prodigy automatically created the dataset for me. When I try to run your next command to load the reddit comments, I get this error saying it doesn't recognize the dataset I am trying to pass in...not sure if I'm using a deprecated command again?
Older versions of Prodigy just let you use the seeds dataset directly in the textcat recipe, but newer versions now all standardise on a patterns file, which gives you more flexibility. This is expected to be a JSONL file on disk and you can create it with terms.to-patterns. There's also a section in the video description that explains the differences in newer versions of Prodigy (since the video is already a couple of years old):
Since this video was recorded, the textcat.teach command has changed in one detail: instead of a --seeds argument, you can now pass in --patterns, which lets you describe single words but also more complex combinations of tokens based on their attributes. To convert a seed dataset to patterns, you can use the terms.to-patterns recipe. For more details, see here: Seeds not recognized by textcat.teach
4a15465005