May be mostly moot by the time this makes it through to the BALUG
list server and out. In any case ...
----- Forwarded message from Michael.Paoli(a)cal.berkeley.edu -----
Date: Tue, 20 Nov 2018 18:59:24 -0800
From: "Michael Paoli" <Michael.Paoli(a)cal.berkeley.edu>
Subject: outage: [www.]{sf-lug,balug}.org
To: SF-LUG <sf-lug(a)linuxmafia.com>
And ... outage again :-\
Guestimating if it's the "usual", will likely be on-line again
by sometime Wednesday evening (I may or may not get to it before then).
Per earlier:
impacts all [*.]sf-lug.{org,com} & [*.]balug.org
SF-LUG lists remain up and on-line (at least as far as I'm aware).
Also, DNS mostly not impacted (in general, slaves remain functional),
though there may be some additional latencies on DNS due to failovers
to other nameservers.
> From: "Michael Paoli" <Michael.Paoli(a)cal.berkeley.edu>
> Subject: "all better now:" Re: outage: [www.]{sf-lug,balug}.org
> Date: Wed, 10 Jan 2018 20:15:04 -0800
> And ... again, same deal, went off-line at:
> <~= 2018-01-11T01:23:21+00:00 2018-01-10T17:23:21-08:00
> and back on-line by:
>> ~= 2018-01-11T04:04:06+00:00 2018-01-10T20:04:06-08:00
> ... again, swift kick to the power switch on DSL modem to reset it,
> and "all better" - again ... at least for now.
>
>> From: "Michael Paoli" <Michael.Paoli(a)cal.berkeley.edu>
>> Subject: "all better now:" Re: outage: [www.]{sf-lug,balug}.org
>> (ETR: eveningish)
>> Date: Wed, 06 Dec 2017 18:08:09 -0800
>
>> And ... swift kick to the power switch on DSL modem to reset it,
>> and "all better now".
>> Looks like the outage started around:
>> BALUG PING: ping6 2001:470:1f04:19e::2 FAILED 2017-12-06T20:53:56+00:00
>> GW PING: ping 198.144.194.233 FAILED 2017-12-06T20:54:09+00:00
>> and service restored a few minutes ago
>>
>>
>>> From: "Michael Paoli" <Michael.Paoli(a)cal.berkeley.edu>
>>> Subject: outage: [www.]{sf-lug,balug}.org (ETR: eveningish)
>>> Date: Wed, 06 Dec 2017 14:46:38 -0800
>>
>>> So,
>>>
>>> They were up and on-line this morning, at least as late as about
>>> mid-morning or later, but off-line now.
>>> POTS line appears to be working, but not IP,
>>> likely the DSL modem needs to be reset (powercycled) again ...
>>> but don't have means to do that remotely.
>>>
>>> Presumably I'll have this resolved sometime this evening once I'm
>>> on-site again.
>>>
>>> impacts all [*.]sf-lug.{org,com} & [*.]balug.org
>>> SF-LUG lists remain up and on-line (at least as far as I'm aware).
----- End forwarded message -----
Tossing this one onto (suitable) list, because ... well, why not! :-)
> From: "Rick Moen" <rick(a)linuxmafia.com>
> Subject: Re: Login to BALUG Wiki
> Date: Thu, 15 Nov 2018 17:57:40 -0800
> Quoting Michael Paoli (Michael.Paoli(a)cal.berkeley.edu):
>
>> Seems more logical would to have that information on the
>> "self-"registration wiki page ... otherwise folks may not see it
>> anyway.
>
> After successful login, I put a notice on
> https://www.wiki.balug.org/wiki/doku.php , near the top.
Thanks ... I also tweaked the language (very) slightly just for slightly
better approximation of (mostly historical) reality.
> I ack your point that autoregistrations were few and far between, but
> lack of any information about how to get a login meant that anyone
> wanting to help would have no idea how to proceed.
>
> Sorry, but what 'self-' registraiton page? Not sure what this reference
> is to.
Heck, been so long I forget where the self-registration wiki page even
is/was.
>> I figured manual was "quite good enough" to stop the immediate issue
>> (the high volume of bot registrations was causing the wiki to bog down
>> and fail in annoying ways).
>
> I sympathise.
>
> Restoring login _with_ CAPTCHA plugin would be a 'have your cake and eat
> it too' solution, IMO -- if/when you get around to it.
Yep *somewhere* on the "todo" list.
Background/history - have a look at:
curl -s --range 134225-268633 https://www.archive.balug.org/log.txt
And *also* very handy for me too ... looks like I'd *disabled* the
registration page - so likely it doesn't show at all, or just won't
let one self-register.
Greetings! This is an advisory about ns1.linuxmafia.com DNS nameserver
downtime having ended. Root cause: AT&T (_not_ my ISP) sabotaged
my ASDL at their local exchange around 8am Tueday, then took about
2 days and 7 hours to find and fix their problem. All services
are back.
ns1.linuxmafia.com is back to doing auth. nameservice, as
arranged, for the following domains of yours:
balug.org (slave)
sf-lug.com (slave)
sf-lug.org (slave)
Evidence below is via fugly shell script ~/bin/testns that
I just cranked out:
#!/bin/bash
domain=$1
for ns in $(whois $domain | grep "Name Server" | \
awk '{ print $3 }' | tr '\r\n' ' ');
do echo -n $ns 'is '; dig +short @"$ns". $domain. SOA | awk '{print $3}';
done
:r! bin/testns balug.orgNS1.LINUXMAFIA.COM is 1540540908
NS1.SVLUG.ORG is 1540540908
NS1.BALUG.ORG is 1540540908
:r! bin/testns sf-lug.comNS.PRIMATE.NET is 1540541199
NS1.LINUXMAFIA.COM is 1540541199
NS1.SF-LUG.COM is 1540541199
:r! bin/testns sf-lug.orgNS1.LINUXMAFIA.COM is 1540541435
NS.PRIMATE.NET is 1539414019
NS1.SVLUG.ORG is 1540541435
NS1.SF-LUG.ORG is 1540541435
DO NOT REPLY ALL! (*unless* you're a member of **both** lists)
Just an FYI mostly, but temporarily we have a bit of reduced
redundancy on DNS for balug.org/sf-lug.org/sf-lug.com
Expecting this to be relatively temporary (at least once AT&T finally
gets their sh*t together ... at least for a little bit).
Anyway, in case anyone wonders or notices. Hopefully it will
be *all better soon* ... at least **relatively** soon.
Impacts should mostly be pretty minimal - some initial queries might
occasionally take a bit longer (possibly timing out on first
(randomly selected) authoritative nameserver), but should otherwise
generally have almost zero impact (due to TTLs, data will generally
be cached for a while once successfully resolved).
$ (for d in balug.orgsf-lug.orgsf-lug.com; do dig +noall +answer
+nottl "$d". NS | sed -e 's/[ ]\{1,\}/ /g'; done) | fgrep
linuxmafia.combalug.org. IN NS ns1.linuxmafia.com.
sf-lug.org. IN NS ns1.linuxmafia.com.
sf-lug.com. IN NS ns1.linuxmafia.com.
$
----- Forwarded message from rickmoen(a)gmail.com -----
Date: Wed, 24 Oct 2018 01:59:36 -0700
From: "Rick Moen" <rickmoen(a)gmail.com>
Reply-To: rick(a)deirdre.net
Subject: ns1.linuxmafia.com downtime
To: "Michael Paoli" <Michael.Paoli(a)cal.berkeley.edu>
Greetings! This is an advisory about current downtime of my
ns1.linuxmafia.com DNS nameserver, starting about 8am on Tuesday, Oct.
23rd. Near and I and Mike Durkin, proprietor of Raw Bandwidth
Communications ('RBC', my ISP) have been able to determine, AT&T somehow
sabotaged my household ASDL, and thus took my entire household including
the server online. Mike is now trying to get them to fix they screw-up.
Meantime, ns1.linuxmafia.com is _not_ doing auth. nameservice, as
arranged, for the following domains of yours:
balug.org (slave)
sf-lug.com (slave)
sf-lug.org (slave)
I'm advising everyone I'm doing auth. DNS for of the ongoing outage,
so this is your notice. Hope to give better news soon.
----- End forwarded message -----
BALUG VM was down for fair while earlier today.
Has now been up again for over 7 hours now.
Looks like there was an I/O hiccup on the physical host,
which didn't particularly impact the physical hosts, but
was enough of an interruption (delay) that the BALUG VM kernel
paniced.
Did have a 3rd hard drive testing, etc. on the physical host
at the time ... might've hit issues and possibly it did a bus
reset? Who knows for sure. Anyway ...
Went down sometime after:
2018-09-02T01:27:36-07:00
and was brought back up around:
2018-09-02T13:39:30-07:00
Various bits I noted in log:
$ curl -s --range 375155-378925 http://www.archive.balug.org/log.txt
2018-09-02 Michael Paoli
host crashed sometime after:
2018-09-02T01:27:36-07:00
but probably before about:
2018-09-02T01:35:00-07:00
on console, we got:
# [54894.969741] sd 0:0:0:0: [sda] tag#3 ABORT operation started
[54900.078084] sd 0:0:0:0: ABORT operation timed-out.
[54900.080312] sd 0:0:0:0: [sda] tag#2 ABORT operation started
[54905.198438] sd 0:0:0:0: ABORT operation timed-out.
[54905.200517] sd 0:0:0:0: [sda] tag#1 ABORT operation started
[54905.357128] Kernel panic - not syncing: assertion "i &&
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
"/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line
3399
[54905.357128]
[54905.367774] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-8-amd64
#1 Debian 4.9.110-3+deb9u4
[54905.370776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[54905.372768] 0000000000000000 ffffffff84f31e54 ffff9e2f75d5a300
ffff9e2f7fc03e50
[54905.375471] ffffffff84d7f6ad 0000000000000020 ffff9e2f7fc03e60
ffff9e2f7fc03df8
[54905.378226] 3ea9db08406f9671 0000000100d04ae4 ffffffffc048a250
ffffffffc0489e80
[54905.380982] Call Trace:
[54905.381867] <IRQ> [54905.382541] [<ffffffff84f31e54>] ?
dump_stack+0x5c/0x78
[54905.384428] [<ffffffff84d7f6ad>] ? panic+0xe4/0x23f
[54905.386164] [<ffffffffc048512e>] ? sym_interrupt+0x1c9e/0x1e80 [sym53c8xx]
[54905.388543] [<ffffffffc03aa010>] ?
usb_hcd_poll_rh_status+0x170/0x170 [usbcore]
[54905.391102] [<ffffffffc03a9fc9>] ?
usb_hcd_poll_rh_status+0x129/0x170 [usbcore]
[54905.393627] [<ffffffffc03aa010>] ?
usb_hcd_poll_rh_status+0x170/0x170 [usbcore]
[54905.396144] [<ffffffff84ce7562>] ? call_timer_fn+0x32/0x120
[54905.398071] [<ffffffffc047ea4b>] ? sym53c8xx_intr+0x3b/0x70 [sym53c8xx]
[54905.400386] [<ffffffff84cd418e>] ? __handle_irq_event_percpu+0x7e/0x1a0
[54905.402673] [<ffffffff84cd42e0>] ? handle_irq_event_percpu+0x30/0x70
[54905.404898] [<ffffffff84cd4359>] ? handle_irq_event+0x39/0x60
[54905.406901] [<ffffffff84cd7870>] ? handle_fasteoi_irq+0xa0/0x170
[54905.409001] [<ffffffff84c27faf>] ? handle_irq+0x1f/0x30
[54905.410834] [<ffffffff852187ee>] ? do_IRQ+0x4e/0xe0
[54905.412528] [<ffffffff85216556>] ? common_interrupt+0x96/0x96
[54905.414523] <EOI> [54905.415216] [<ffffffff852151f0>] ?
__sched_text_end+0x1/0x1
[54905.417231] [<ffffffff852154c2>] ? native_safe_halt+0x2/0x10
[54905.419235] [<ffffffff8521520a>] ? default_idle+0x1a/0xd0
[54905.421137] [<ffffffff84cbc7da>] ? cpu_startup_entry+0x1ca/0x240
[54905.423215] [<ffffffff8593df5e>] ? start_kernel+0x447/0x467
[54905.425186] [<ffffffff8593d120>] ? early_idt_handler_array+0x120/0x120
[54905.427438] [<ffffffff8593d408>] ? x86_64_start_kernel+0x14c/0x170
[54905.429842] Kernel Offset: 0x3c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[54905.433484] ---[ end Kernel panic - not syncing: assertion "i &&
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
"/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line
3399
[54905.433484]
... also noted within that same timeframe, on physical host, there
were some storage related events ... but no hard failues seen on that
physical host and no outages or failures or such observed on that
physical host:
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sda [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 69
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 31 to 30
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 66
$