So, looks like the web interface is back up again
(reinstalled the mailman3-web python3-mailman-hyperkitty
packages even though they were already installed,
that seemed to suffice) ...
but ... was that earlier message archived, and will this one
be archived?
On Sat, Aug 31, 2024 at 6:12 AM Michael Paoli via BALUG-Test
<balug-test(a)lists.balug.org> wrote:
> HyperKitty archiver (or web interface thereof) is down,
> presumably a configuration issue/glitch and
> presumably mail …
[View More]interface is fine and Django and remainder of web
> interface looks okay.
[View Less]
Did Debian 11 Bullseye --> 12 Bookworm upgrade
HyperKitty archiver (or web interface thereof) is down,
presumably a configuration issue/glitch and
presumably mail interface is fine and Django and remainder of web
interface looks okay.
Michael Paoli via BALUG-Test wrote on 2024-08-23 05:29:
> restarted the relevant service:
> # systemctl stop mailman3-web.service
> # systemctl start mailman3-web.service
This reminded me - I've modified my unit files so that only one service
needs be started - same for re-starting or stopping:
Create a file at
/etc/systemd/system/mailman3.service.d/override.conf
with these contents:
[Unit]
Upholds = mailman3-web.service
Before = mailman3-web.service
Probably need a `…
[View More]systemctl daemon-reload` for systemd to read it.
Now, starting mailman3.service will *also* start mailman3-web.service.
Now, restarting mailman3.service will *also* restart mailman3-web.service.
Now, stopping mailman3.service will *also* stop mailman3-web.service.
Also, there's really no need for nntp bridging IMHO, but it's enabled by
default. Which causes an extra entire Python interpreter to be loaded in
to memory.
Disable this via:
/etc/mailman3/mailman.cfg
[runner.nntp]
start: no
Save some RAM...
I believe this nntp support will be removed from default settings in the
future, since I called attention to it on MM3 mailing list.
rb
[View Less]
Okay, good actually, that "failed",
held for moderator approval, because, with
attachment, too large.
So, now, forwarded below, without attachment,
and there's link in body if one wants to view the attachment.
And here's at least part of the bounce bit that let me know it "failed":
From: <balug-test-owner(a)lists.balug.org>
Date: Thu, Aug 22, 2024 at 11:27 PM
Subject: balug-test(a)lists.balug.org post from
michael.paoli(a)berkeley.edu requires approval
To: <balug-test-owner(a)lists.…
[View More]balug.org>
As list administrator, your authorization is requested for the
following mailing list posting:
List: balug-test(a)lists.balug.org
From: michael.paoli(a)berkeley.edu
Subject: And the crash and the issue and image attachment test -
test - ignore
The message is being held because:
The message is larger than the 40 KB maximum size
At your convenience, visit your dashboard to approve or deny the
request.
---------- Forwarded message ---------
From: Michael Paoli <michael.paoli(a)berkeley.edu>
Date: Thu, Aug 22, 2024 at 11:26 PM
Subject: And the crash and the issue and image attachment test - test - ignore
To: BALUG-Test <balug-test(a)lists.balug.org>
So,
All was working fine until ...
There was crash(/lockup) ... see "attached"
image (if it makes it through to list?).
And ... if you really want to see that image and it's
not (or no longer attached or not or no longer in the
archive), I've also, at least temporarily
located it here:
https://www.balug.org/tmp.lists/crash.jpg
That's from physical host "vicki",
upon which sometimes the BALUG VM runs.
It was running on there this past Tuesday (US/Pacific) evening,
and then fairly late evening - a
crash - relatively rare short of having a
power glitch/outage (do have a moderate bit of those,
don't have UPS, and when running on laptop (with battery
that holds no charge), if I manage to accidentally pull/nuge
wiggle cord connection out - it drops hard).
Anyway, the image - that's photo I took of the console
screen of the physical vicki host (powered on ye olde CRT -
relatively rare I do that), and took picture from "smart" phone
(and then trimmed excess bits out of the image ... also dropped the
quality a bit to reduce image file size while not losing much, if anything
in readability of the text on the screen).
So, before that, all was working fine.
After that all seems to be working fine except
for the web interface to archives and such
(postorius / hyperkitty).
And thus far all my testing and troubleshooting and
isolating, it actually looks like the Mailman 3 parts of it
(all, or almost all) are working properly. The mail interface
still appears to be working fine. The django portion of web
interface works fine. But on the Apache server, the
postorius portion (mostly*) fails ... however, when I did (most notably
with strace) on the communication between the two,
it appears postorius is responding perfectly fine with good
content, yet somehow Apache has issues or doesn't get that
content, and ends up giving a 500 error page.
*mosty ... I did the other day, stumble across a bit that still works.
List membership roster ... with cookie ... can't login now, but still
having older
valid authentication cookie, I'm able to load up that page fine:
https://lists.balug.org/mailman3/postorius/lists/balug-test.lists.balug.org…
... well, it actually wants to download it, and downloads it fine with
the correct data.
So, yeah, odd, that bit of postorius works all the way through
Apache and to client ... haven't, however, found other parts that
make it through successfully like that while this issue is otherwise
still present.
And yes, did also do strace(1) data gathering on that too ... haven't yet
isolated why that works and (most of the rest) doesn't - but may use
significantly different interface/component(s) under the covers.
Anyway, that's where things presently stand.
Still working to (isolate and) fix the issue.
Meantime, did also do some updates on the main page,
notably: https://www.balug.org/#Lists
So folks at least have a clue/information about that (and
workarounds as feasible).
I'm guestimating there's like maybe some lock or state file that didn't
get properly cleared or reset, or some subtle corruption or the like - and
something along those lines is what's causing the issue. Don't know for sure,
but given circumstances, that seems possible/probable. Also possible (but
perhaps not as likely) there was some subtle latent defect, as I'd not rebooted
in some moderate while, and the (crash and) reboot exposed that issue,
where it wasn't seen before that. But looking over the
(re)boot history, that doesn't seem most probable.
Let's see, peeking at that again ... (and times UTC / GMT0),
let's see, the not quite latest (Aug 21 06:20)
was the crash and subsequent boot, and at least all of quite a number
immediately before that were regular normal reboots.
And the one I did after that was another reboot just to see if
that might happen to clear the issue/error (no luck on that).
$ { who -H | head -n 1; who -r /var/log/wtmp | tac; } | head -n 20
NAME LINE TIME COMMENT
run-level 5 Aug 21 08:57
run-level Aug 21 08:56
run-level 5 Aug 21 06:20
run-level 5 Aug 19 07:23
run-level Aug 19 07:22
run-level 5 Jul 30 15:13
run-level Jul 30 14:50
run-level 5 Jul 28 00:10
run-level 5 Jul 6 14:45
run-level 5 Jul 5 09:02
run-level Jul 5 09:01
run-level 5 Jul 5 03:31
run-level Jul 5 03:30
run-level 5 Jun 23 00:03
run-level 5 Jun 18 15:44
run-level 5 Jun 17 08:16
run-level 5 Jun 17 05:53
run-level 5 May 26 22:58
run-level 5 May 26 19:44
[View Less]
---------- Forwarded message ---------
From: Michael Paoli <michael.paoli(a)berkeley.edu>
Date: Wed, Aug 21, 2024 at 6:14 PM
Subject: Re: [sf-lug] BALUG: meeting Tuesday 2024-08-20 Supporting
Older Systems, etc. & other BALUG News: Mailman 2 --> Mailman 3 &
other upgrades
To: Ronald Barnes <ron(a)ronaldbarnes.ca>
Cc: SF-LUG <sf-lug(a)linuxmafia.com>
Ah, well, the Mailman 2 --> Mailman 3
migration was mostly going quite well ... until it wasn't. 8-O
Bumped into …
[View More]a significant glitch yesterday evening,
still haven't got that sorted out yet, but about to start
looking into it again.
Fair bit of detail on the balug-test(a)lists.balug.org list.
And appears the mail interfaces are working, so can
post (if otherwise permitted to), subscribe, unsubscribe, etc.
via email,
but fair bit of web interfaces isn't working currently.
So, for the curious:
at least in part, email interface:
subscribe: balug-test-subscribe(a)lists.balug.org
unsubscribe: balug-test-unsubscribe(a)lists.balug.org
post: balug-test(a)lists.balug.org
help: balug-test-request(a)lists.balug.org with Subject: (or body) of:
help
As for web, postorious/hyperkitty (archive, etc.) portion isn't working,
but the Django portion is working.
On Wed, Aug 21, 2024 at 5:26 PM Ronald Barnes <ron(a)ronaldbarnes.ca> wrote:
>
> Michael Paoli wrote on 2024-08-15 22:04:
>
> > Mailman 2 --> Mailman 3
>
> Oops, went out for bike ride, forgot about meeting.
>
> Was interested in this topic especially.
[View Less]
Well ... getting closer ...
So, Apache communicates with
srwxr-xr-x 1 www-data www-data 0 Aug 21 08:56 /run/mailman3-web/uwsgi.sock
and apparently writes that socket fine,
but when reading it ...
HTTP/1.1 500 Internal Server Error
So, looks like somehow some kind of error between there and postorius.
Looks like postorius is successfully writing lots of:
HTTP/1.0 200 OK
responses ... but (presumably) those somehow aren't making it to Apache?
On Wed, Aug 21, 2024 at 3:50 AM Michael Paoli via …
[View More]BALUG-Test
<balug-test(a)lists.balug.org> wrote:
>
>
>
>
> ---------- Forwarded message ----------
> From: Michael Paoli <michael.paoli(a)berkeley.edu>
> To: BALUG-Test <balug-test(a)lists.balug.org>
> Cc:
> Bcc:
> Date: Wed, 21 Aug 2024 03:50:08 -0700
> Subject: [BALUG-Test] Re: "Oops" ... lists down for a bit (back by the time you see this?)
> Well ... at least that's good - looks like the mail part is probably
> working fine.
> I thought for a bit it wasn't with one quick test I ran ... but I may
> not have been
> patient enough with that or may have not done that test quite right.
> So, looks like just web/postorius interface I need to get working again.
> And by the way, the Django administration interface appears to be fine,
> so looks like issue is limited to postorius, and between postorius and Apache.
>
> So at least in theory, subscribe/unsubscribe via email should also be
> working fine too.
>
> On Wed, Aug 21, 2024 at 3:43 AM Michael Paoli via BALUG-Test
> <balug-test(a)lists.balug.org> wrote:
> >
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Michael Paoli <michael.paoli(a)berkeley.edu>
> > To: BALUG-Test <balug-test(a)lists.balug.org>
> > Cc:
> > Bcc:
> > Date: Wed, 21 Aug 2024 03:42:38 -0700
> > Subject: [BALUG-Test] "Oops" ... lists down for a bit (back by the time you see this?)
> > "Oops" ... lists down for a bit.
> > Drats ... and hopefully all better by the time this mail makes it to list.
> >
> > So ... all was fine and dandy until ...
> > managed to have a host booboo,
> > basically locked up solid, and I power cycled it
> > (this was physical host upon which the VM was running).
> > Shouldn't be any biggie ... after that all up and fine again ...
> > except the Mailman 3 lists.
> >
> > So, ... isolating and working towards fixing that.
> > I'm guestimating maybe issue with some kind of lock file or
> > the like that didn't get cleaned up upon (re)boot,
> > or possibly, since I perhaps didn't do as many reboots as I ought
> > to have made sure it would "always" come up clean, perhaps there was
> > some misconfiguration or the like that somehow snuck in, that would be
> > effectively a latent defect/issue, and that wouldn't show until a reboot or
> > attempted restart of the relevant service(s).
> >
> > At this point still troubleshooting to narrow down the issue.
> >
> > As far as I can tell so far, looks like web server hits some kind of issue
> > and generally gives or passes along a 500 response, that then ends up
> > with the relevant web page(s) failing. When I did deeper into the
> > backend service(s) ... notably postorius, looks like it gets requests okay
> > and responds to them okay ... but something goes wrong somewhere
> > between there and web server properly getting and passing that
> > along to client browser.
> > E.g.:
> > https://lists.balug.org/mailman3/postorius/lists/
> > I see in postorius bit returning (from strace(1)) including:
> > {\"display_name\": \"BALUG-Admin\", \"fqdn_listname\":
> > \"balug-admin(a)lists.balug.org\",
> > {\"display_name\": \"BALUG-Announce\", \"fqdn_listname\":
> > \"balug-announce(a)lists.balug.org\",
> > {\"display_name\": \"BALUG-Talk\", \"fqdn_listname\":
> > \"balug-talk(a)lists.balug.org\",
> > {\"display_name\": \"BALUG-Test\", \"fqdn_listname\":
> > \"balug-test(a)lists.balug.org\",
> > But somehow that doesn't make it to the web page the server serves ... so I'm
> > guestimating there's some issue somewhere between postorius and Apache.
> >
> >
> > ---------- Forwarded message ----------
> > From: Michael Paoli via BALUG-Test <balug-test(a)lists.balug.org>
> > To: BALUG-Test <balug-test(a)lists.balug.org>
> > Cc:
> > Bcc:
> > Date: Wed, 21 Aug 2024 03:42:38 -0700
> > Subject: [BALUG-Test] "Oops" ... lists down for a bit (back by the time you see this?)
> > _______________________________________________
> > BALUG-Test mailing list -- balug-test(a)lists.balug.org
> > To unsubscribe send an email to balug-test-leave(a)lists.balug.org
>
>
> ---------- Forwarded message ----------
> From: Michael Paoli via BALUG-Test <balug-test(a)lists.balug.org>
> To: BALUG-Test <balug-test(a)lists.balug.org>
> Cc:
> Bcc:
> Date: Wed, 21 Aug 2024 03:50:08 -0700
> Subject: [BALUG-Test] Re: "Oops" ... lists down for a bit (back by the time you see this?)
> _______________________________________________
> BALUG-Test mailing list -- balug-test(a)lists.balug.org
> To unsubscribe send an email to balug-test-leave(a)lists.balug.org
[View Less]
Well ... at least that's good - looks like the mail part is probably
working fine.
I thought for a bit it wasn't with one quick test I ran ... but I may
not have been
patient enough with that or may have not done that test quite right.
So, looks like just web/postorius interface I need to get working again.
And by the way, the Django administration interface appears to be fine,
so looks like issue is limited to postorius, and between postorius and Apache.
So at least in theory, subscribe/…
[View More]unsubscribe via email should also be
working fine too.
On Wed, Aug 21, 2024 at 3:43 AM Michael Paoli via BALUG-Test
<balug-test(a)lists.balug.org> wrote:
>
>
>
>
> ---------- Forwarded message ----------
> From: Michael Paoli <michael.paoli(a)berkeley.edu>
> To: BALUG-Test <balug-test(a)lists.balug.org>
> Cc:
> Bcc:
> Date: Wed, 21 Aug 2024 03:42:38 -0700
> Subject: [BALUG-Test] "Oops" ... lists down for a bit (back by the time you see this?)
> "Oops" ... lists down for a bit.
> Drats ... and hopefully all better by the time this mail makes it to list.
>
> So ... all was fine and dandy until ...
> managed to have a host booboo,
> basically locked up solid, and I power cycled it
> (this was physical host upon which the VM was running).
> Shouldn't be any biggie ... after that all up and fine again ...
> except the Mailman 3 lists.
>
> So, ... isolating and working towards fixing that.
> I'm guestimating maybe issue with some kind of lock file or
> the like that didn't get cleaned up upon (re)boot,
> or possibly, since I perhaps didn't do as many reboots as I ought
> to have made sure it would "always" come up clean, perhaps there was
> some misconfiguration or the like that somehow snuck in, that would be
> effectively a latent defect/issue, and that wouldn't show until a reboot or
> attempted restart of the relevant service(s).
>
> At this point still troubleshooting to narrow down the issue.
>
> As far as I can tell so far, looks like web server hits some kind of issue
> and generally gives or passes along a 500 response, that then ends up
> with the relevant web page(s) failing. When I did deeper into the
> backend service(s) ... notably postorius, looks like it gets requests okay
> and responds to them okay ... but something goes wrong somewhere
> between there and web server properly getting and passing that
> along to client browser.
> E.g.:
> https://lists.balug.org/mailman3/postorius/lists/
> I see in postorius bit returning (from strace(1)) including:
> {\"display_name\": \"BALUG-Admin\", \"fqdn_listname\":
> \"balug-admin(a)lists.balug.org\",
> {\"display_name\": \"BALUG-Announce\", \"fqdn_listname\":
> \"balug-announce(a)lists.balug.org\",
> {\"display_name\": \"BALUG-Talk\", \"fqdn_listname\":
> \"balug-talk(a)lists.balug.org\",
> {\"display_name\": \"BALUG-Test\", \"fqdn_listname\":
> \"balug-test(a)lists.balug.org\",
> But somehow that doesn't make it to the web page the server serves ... so I'm
> guestimating there's some issue somewhere between postorius and Apache.
>
>
> ---------- Forwarded message ----------
> From: Michael Paoli via BALUG-Test <balug-test(a)lists.balug.org>
> To: BALUG-Test <balug-test(a)lists.balug.org>
> Cc:
> Bcc:
> Date: Wed, 21 Aug 2024 03:42:38 -0700
> Subject: [BALUG-Test] "Oops" ... lists down for a bit (back by the time you see this?)
> _______________________________________________
> BALUG-Test mailing list -- balug-test(a)lists.balug.org
> To unsubscribe send an email to balug-test-leave(a)lists.balug.org
[View Less]