Discord Digest 2019-09-13

This is part of a weekly (or biweekly) installment of happenings on Discord, a community chat staffed by the developers and curated by its users.

Enabling MongoDB

howdy! i'm trying to use mongo on apnscp. i've enabled mongo on apnscp-vars-runtime.yml and re run the bootstrap. but the mongod executable still not found on the client user terminal. is there anything i missed?

MongoDB is bundled with apnscp with activation at install-time via mongodb_enabled. Bootstrapper performs optimizations during installation to decrease installation times, one of which is en masse population of filesystem packages in BoxFS.

Breaking out the specific role and forcing population heuristics quickly solves the issue,

cd /usr/local/apnscp/resources/playbooks
ansible-playbook bootstrap.yml --tags=apnscp/initialize-filesystem-template --extra-vars=populate_filesystem_template=true

File ghosting

File ghosting occurs when a layer reload occurs (systemctl reload fsmount) and at least 1 file handle remains open on the updated file. File handles occur whenever a file is opened by a process. Rebooting a server is one solution to flush the filesystem cache, but so too is manipulation of the drop_caches kernel tunable. Flummoxed at top reporting a 64 GB machine has less than 1 GB free on a low workload? linuxatemyram.com would like a word with you!

Dropping filesystem caches allows for all cached files in to be released. echo 2 > /proc/sys/vm/drop_caches. The inode of the file will change now reflecting the inode installed in /home/virtual/FILESYSTEMTEMPLATE.

DNSBL behavior

Now shouldn't a local resolver be setup for dns caching for rbl lookups, etc?

apnscp uses CloudFlare DNS for resolution for optimal performance, based upon metrics via dnsperf.com. Local DNS resolvers may be used by changing dns_robust_nameservers, but discouraged except in large-scale installations. Postfix performs a variety of DNSBL lookups, which rely natively on the libc resolver usage. DNSBL thus flows through to the upstream resolver before fetching the DNSBL result.

Upside: all DNSBL queries come from a single source, Cloudflare's nameservers thus deduplicating redundant queries across all servers. DNSBL services strongly discourage repeated queries; leveraging existing infrastructure via Cloudflare ensures apnscp honors this request.

Downside: vulnerable to DNS poisoning as Cloudflare cannot be 100% trusted.

Contrapositive upside: Cloudflare has a proven record in neutrality in all but the most insalubrious situations. Over 79% of proxied content, approximately 10% of websites, use Cloudflare totaling a staggering 1.7 million and counting. More are coming as accessibility evolves. Further, DNSBL lookups are just one layer in a variety of spam filtering checks to thwart spam.

Increasing mod_evasive sensitivity

I think I did more.  In my experience, i see more brute force attacks on wp-login than anything else, as long as fail2ban stops that I’m good

You’ll want to consider the GET and POST and a typical password typo and not block someone for being an idiot, but yeah good start

On my cPanel servers, because it's not as cool as apnscp, I just block access to xmlrpc and allow customers to override that via .htaccess rules because most don't even use it

There's a solution!

mod_evasive is context-aware using Apache directives. mod_evasive now ships with a filter to restrict POST attempts to xmlrpc.php and wp-login.php as it's an enormous ingress to compromised accounts and burnt CPU cycles. If you've had apnscp installed prior to this week, enable the behavior via cpcmd config:set apache.evasive-wordpress-filter true. A stringent filter will be applied to protect wp-login.php and xmlrpc.php resources with a rate of 3 attempts in 2 seconds.

Let's expound upon this (as I did myself when working through a solution):

  • The following rule applies to files named "wp-login.php"
    - glob is quicker than regular expression patterns by a factor of 5-10x!
  • If the request method isn't a POST, disable bean counting.
  • If more than 3 POST attempts to the same resource occur within a 2 second interval, then return a DOSHTTPStatus response (429 Too Many Requests) and log the message via syslog to /var/log/messages.
  • fail2ban will pick up the request and place the IP address into the temporary ban list.
# Block wp-login brute-force attempts
<Files "wp-login.php">
    <If "%{REQUEST_METHOD} != 'POST'">
        DOSEnabled off
    </If>
    DOSPageCount 3
    DOSPageInterval 2
</Files>

As an exercise for the reader, whose contribution would be much appreciated as a guest contributor, vallumd can be used to distribute these banlists across all participating nodes.

Surging load escape hatch

I had to write a script that uses the apache status page to count the number of login attempts from an IP then I block that IP if the count is greater than 6 such a pain, can't wait to be done with all of that

Blocking 35.202.163.48 for 119 simultaneous logins to wp-login.php

Ideally the above example should work. Should is a word that word I should stop using, but too highlights an important attribute in this industry - hindsight is always 20/20. Before delegating sophisticated brute-force blocks to a mod_evasive => fail2ban pipeline, I relied upon an escape hatch to quash load surges.

apnscp includes an ultimate response, a shutdown via watchdog (cpcmd config:get system.watchdog-load) that if the run-queue depth (aka "load average" misnomer) exceeds this level it'll force a shutdown to regain control. By default it's logical processor count * 25, which is a pretty good approximation that the server is out to lunch.

Another idea is to integrate server resuscitation into Monit, part of the Argos framework for an unassisted recovery. Drop the following file into /etc/monit.d/bins/ban_spam.sh; chmod 755 /etc/monit.d/bins/ban_spam.sh after doing so.

#!/bin/sh

# Block IPs with over 100 connections open to server
function find_spam {
        netstat -nptu | egrep -E 'TIME_W|SYN_' | awk '{print $5}' | sort -n | cut -d: -f1 | sort -n \
                | uniq -c | sort -n | tail -n 5 \
                | egrep '^[[:space:]]*[[:digit:]][[:digit:]][[:digit:]][[:digit:]]*'
}

# ban excessive apache connections
function ban_spam {
        MYIPS=($(ip -o addr | awk '!/^[0-9]*: ?link\/ether/ {gsub("/", " "); print $4}'))
        find_spam | awk '{print $2}' | while read IP ; do
                [[ "${MYIPS[@]}" =~ "${IP}" ]] && continue
                echo "Banning $IP"
                /sbin/iptables -A INPUT -s $IP  -j DROP
        done
}

ban_spam

Then modify /etc/monit.d/physical.conf adding the following line,

if loadavg (1min) > 50 for 2 times within 5 cycles then exec "/etc/monit.d/bins/ban_spam.sh"
if loadavg (5min) > 25 for 2 times within 15 cycles then exec "/etc/monit.d/bins/ban_spam.sh"

Let's assume a 2 processor system. grep CPU /proc/cpuinfo | wc -l for specific numbers. When run queue exceeds either 50 twice within 5 checks (30 seconds by default), perform a reboot; the server is toast. Alternatively if the 5 minute average decreases, but still abnormal, perform a cautionary reboot to free resources. Argos will pick up the change and send a notification provided it's configured.

Eventually I'd like to step away from this extreme response to extreme resource endangerment, instead relying on cgroups embedded in the kernel to catch issues sooner rather than later. 'Tis a topic for another day though.