Lemmy.ml has been giving generic nginx 500 errors on site visits for over a week. Typically it runs into periods of errors for about 1 to 5 seconds, then it clears up, to then return within 5 or 10 minutes.

I’ve seen them on Beehaw, I’ve seen them on Lemmy.world, and other reports:

https://lemmy.ml/post/1206363
https://lemmy.ml/post/1213640
https://lemmy.ml/post/1128019

Two Concerns Moving Forward

  1. How can we configure nginx to give a custom error message that server admins are aware of and tracking the problem? Instead of just the generic 500 error webpage.

  2. Why is lemmy_server not responding to nginx? Can we start sharing tips on how to parse the logfiles and what to look for to compile some kind of quantity of how many of these errors we are getting?

I think lemmy_server should start to keep internal error logs about in-code failure. My experience with Lemmy 0.17.4 is that it tends to conceal errors when the database transaction fails or a federationn network transaction fails. I think we need to get these errors to bubble into an organized purpose-named log and back to the end-user so they know what code path they are running into failures on and can communicate more precisely to developers and server operators.