søndag 13. mars 2016

Using someonewhocares.org/hosts to create a BIND domain blacklist

2016-06-17 update:
Switched to using uniq for duplicate removal. Uniq can do case-insensitive matching, sort can't.

Original post:
Some awesome guy named Dan Pollock (pollock@theorem.ca) maintains a list of advertising, malware, shock and otherwise potentially unwanted domains at someonewhocares.org/hosts. I've been using this list for years on most of my private computers, but one day I got tired of manually copying the list into my hosts files and decided to set up a more centralized solution in the form of a Linux box running BIND.

Now I'm not going to do a write-up on how to install BIND since it's readily available in the repositories of most Linux distros and there are already plenty of good tutorials on installing and configuring it. Instead, I will supply the Bash script that I use to convert the hosts file from someonewhocares.org into .zones and .db files for BIND to load.

The end result after you've run this script is that the BIND server will return 127.0.0.1 on lookups for any of the blacklisted domains. It's probably worth pointing out that I'm not mainly a Linux guy and that my Bash skills are not the greatest, so the script probably has a lot of room for improvement and doing things more elegantly. But hey, at least it works. Feel free to leave a comment if you have suggestions for improvements.

Oh, and please note that this script was written on a Raspberry Pi running Raspbian. If you're on a different flavor of Linux I guess it's possible that BIND's config files are in different locations or something else like the service name is different. Adjust the script accordingly.

Make sure that the user you're running the script as has sufficient rights to write to BIND's config directories and restart the BIND service.

If Blogger is messing up the formatting for you, try downloading a plaintext version from here: holmvikit.no/Personal/update-dns-blacklist.sh

##############################################################################
#!/bin/bash

# THE DOMAIN BLACKLIST USED BY THIS SCRIPT IS
# CREATED AND MAINTAINED BY DAN POLLOCK
# (pollock@theorem.ca), NOT BY THE CREATOR OF
# THIS SCRIPT
#
# For more details on Dan Pollock, please see
# http://someonewhocares.org/. For full credits
# on the domain blacklist, please go to
# http://someonewhocares.org/hosts/ and scroll
# down to the bottom of the page.

# For BIND to actually load the blacklisted domains
# you need to add include "/etc/bind/blacklisted.zones";
# to /etc/bind/named.conf.local

# What this script basically does is this:
# 1: Grab a fresh list of nasty domains from
#    someonewhocares.org (the 0.0.0.0 version)
# 2: (Optional) Skip the Windows10 section of
#     the list since it causes problems with
#     Windows Update
# 3: Pull the domain names only from the list
#    of unwanted domains, sort them and remove
#    duplicates
# 4: Create the necessary .zones and .db files
#    for BIND to load the blacklisted domains
#
# Once this is done, BIND will return 127.0.0.1 as
# the IP address for any blacklisted domain.

# Clean up old hosts file and BIND blacklist files
rm hosts
rm /etc/bind/blacklisted.zones
rm /etc/bind/blacklist/*

# Get the list of unwanted domains from someonewhocares.org
wget http://someonewhocares.org/hosts/zero/hosts

# Skip the Windows10 section, since as of 2016-03-13 it
# causes problems with Windows Update
#
# If you want to skip more than one section you should
# probably expand this part of the script so that it
# can take array of unwanted sections or load them from
# a file or something.
mapfile -t domains < hosts

cleanedDomains=""

for i in "${domains[@]}"
do
  if [ "$i" == "#<Windows10>" ]
  then
    skipUntil="#</Windows10>"
    echo "# Skipping section $i"
  fi
  if [ "$skipUntil" != "" ]
  then
    if [ "$i" == "$skipUntil" ]
    then
      skipUntil=""
      echo "# Done skipping section $i"
    fi
  else
    cleanedDomains+="$i\n"
  fi
done

echo -e $cleanedDomains > hosts
# End skip Windows10 section

# Grab the domain names only from lines starting with 0.0.0.0,
# sort them and remove duplicates.
domains=`grep -Po '(?<=0\.0\.0\.0 )(?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6}' hosts | sort | uniq -iu`

# Create an entry in blacklisted.zones and a db-file for each blacklisted domain
for domain in `echo $domains`;
do
  echo "zone \"$domain\" {type master; file \"/etc/bind/blacklist/$domain.db\";};" >> /etc/bind/blacklisted.zones
  dbline=""
  dbline+="\$TTL 3600\n"
  dbline+="@ IN SOA $domain. info.$domain. (2014052101 7200 120 2419200 3600)\n"
  dbline+="   NS      ns.$domain.\n"
  dbline+="   A       127.0.0.1\n"
  dbline+="*  A       127.0.0.1\n"
  echo -e $dbline >> /etc/bind/blacklist/$domain.db
done

# Remove the hosts file that we used to temporarily
# store blacklisted domains
rm hosts

# Restart the BIND service
service bind9 restart

##############################################################################