About this series
I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I’ve been feeling less enthusiastic about the stronghold that these corporations have over my digital presence. I am the first to admit that using “free” services is convenient, but these companies are sometimes taking away my autonomy and exerting control over society. To each their own of course, but for me it’s time to take back a little bit of reponsibility for my online social presence, away from centrally hosted services and to privately operated ones.
This series details my findings starting a micro blogging website, which uses a new set of super interesting open interconnect protocols to share media (text, pictures, videos, etc) between producers and their followers, using an open source project called Mastodon.
Similar to how blogging is the act of publishing updates to a website, microblogging is the act of publishing small updates to a stream of updates on your profile. You can publish text posts and optionally attach media such as pictures, audio, video, or polls. Mastodon lets you follow friends and discover new ones. It doesn’t do this in a centralized way, however.
Groups of people congregate on a given server, of which they become a user by creating an account on that server. Then, they interact with one another on that server, but users can also interact with folks on other servers. Instead of following @IPngNetworks, they might follow a user on a given server domain, like @IPngNetworks@ublog.tech. This way, all these servers can be run independently but interact with each other using a common protocol (called ActivityPub). I’ve heard this concept be compared to choosing an e-mail provider: I might choose Google’s gmail.com, and you might use Microsoft’s live.com. However we can send e-mails back and forth due to this common protocol (called SMTP).
I thought I would give it a go, mostly out of engineering curiosity but also because I more strongly feel today that we (the users) ought to take a bit more ownership back. I’ve been a regular blogging and micro-blogging user since approximately for ever, and I think it may be a good investment of my time to learn a bit more about the architecture of Mastodon. So, I’ve decided to build and productionize a server instance.
I registered uBlog.tech. Incidentally, if you’re reading this and would like to participate, the server welcomes users in the network-, systems- and software engineering disciplines. But, before I can get to the fun parts though, I have to do a bunch of work to get this server in a shape in which it can be trusted with user generated content.
I’m running Debian on (a set of) Dell R720s hosted by IPng Networks in Zurich, Switzerland. These machines are all roughly the same, and come with:
- 2x10C/10T Intel E5-2680 (so 40 CPUs)
- 256GB ECC RAM
- 2x240G SSD in mdraid to boot from
- 3x1T SSD in ZFS for fast storage
- 6x16T harddisk with 2x500G SSD for L2ARC, in ZFS for bulk storage
Data integrity and durability is important to me. It’s the one thing that typically the commercial vendors do really well, and my pride prohibits me from losing data due to things like “disk failure” or “computer broken” or “datacenter on fire”. So, I handle backups in two main ways: borg(1) and zrepl(1).
- Hypervisor hosts make a daily copy of their entire filesystem using borgbackup(1) to a set of two remote fileservers. This way, the important file metadata, configs for the virtual machines, and so on, are all safely stored remotely.
- Virtual machines are running on ZFS blockdevices on either the SSD pool, or the disk pool, or both. Using a tool called zrepl(1) (which I described a little bit in a [previous post]), I create a snapshot every 12hrs on the local blockdevice, and incrementally copy away those snapshots daily to the remote fileservers.
If I do something silly on a given virtual machine, I can roll back the machine filesystem state to the previous checkpoint and reboot. This has saved my butt a number of times, during say a PHP 7 to 8 upgrade for Librenms, or during an OpenBSD upgrade that ran out of disk midway through. Being able to roll back to a last known good state is awesome, and completely transparent for the virtual machine, as the snapshotting is done on the underlying storage pool in the hypervisor. The fileservers run physically separated from the server pools, one in Zurich and another in Geneva, so this way, if I were to lose the entire machine, I still have a ~12h old backup in two locations.
I provision a VM with 8vCPUs (dedicated on the underlying hypervisor), including 16GB of memory and two virtio network cards. One NIC will
connect to a backend LAN in some RFC1918 address space, and the other will present an IPv4 and IPv6 interface to the internet. I give this
machine two blockdevices, one small one of 16GB (vda) that is created on the hypervisor’s
ssd-vol0/libvirt/ublog-disk0, to be used only
for boot, logs and OS. Then, a second one (vdb) is created at 300GB on
ssd-vol1/libvirt/ublog-disk1 and it will be used for Mastodon and
its supporting services.
Then I simply install Debian into vda using
virt-install. At IPng Networks we have some ansible-style automation that takes over the
machine, and further installs all sorts of Debian packages that we use (like a Prometheus node exporter, more on that later), and sets up a
firewall that allows SSH access for our trusted networks, and otherwise only allows port 80 and 443 because this is to be a webserver.
After installing Debian Bullseye, I’ll create the following ZFS filesystems on vdb:
pim@ublog:~$ sudo zfs create -o mountpoint=/home/mastodon data/mastodon -V10G pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/elasticsearch data/elasticsearch -V10G pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/postgresql data/postgresql -V20G pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/redis data/redis -V2G pim@ublog:~$ sudo zfs create -o mountpoint=/home/mastodon/libve/public/system data/mastodon-system
As a sidenote, I realize that this ZFS filesystem pool consists only of vdb, but its underlying blockdevice is protected in a raidz, and
it is copied incrementally daily off-site by the hypervisor. I’m pretty confident on safety here, but I prefer to use ZFS for the virtual
machine guests as well, because now I can do local snapshotting, of say
data/mastodon-system, and I can more easily grow/shrink the
datasets for the supporting services, as well as monitor them individually for wildgrowth.
I then go through the public Mastodon docs to further install the machine. I choose not to go the Docker route, but instead stick to systemd installs. The install itself is pretty straight forward, but I did find the nginx config a bit rough around the edges (notably because the default files I’m asked to use have their ssl certificate stanza’s commented out, while trying to listen on port 443, and this makes nginx and certbot very confused). A cup of tea later, and we’re all good.
I am not going to start prematurely optimizing, and after a very engaging thread on Mastodon itself [@firstname.lastname@example.org] with a few fellow admins, the consensus really is to KISS (keep it simple, silly!). In that thread, I made a few general observations on scaling up and out (none of which I’ll be doing initially), just by using some previous experience as a systems engineer, and knowing a bit about the components used here:
- Running services on dedicated machines (ie. saparate storage, postgres, Redis, Puma and Sidekiq workers)
- Fiddle with Puma worker pool (more workers, and/or more threads per worker)
- Fiddle with Sidekiq worker pool and dedicated instances per queue
- Put storage on local minio cluster
- Run multiple postgres databases, read-only replicas, or multimaster
- Run cluster of multiple redis instances instead of one
- Split off the cache redis into mem-only
- Frontend the service with a cluster of NGINX + object caching
Some other points of interest for those of us on the adventure of running our own machines follow:
Mastodon is a chatty one - it is logging to stdout/stderr and most of its tasks in Sidekiq have a lot to say. On Debian, by default this
output goes from systemd into journald which in turn copies it into syslogd. The result of this is that each logline hits the
disk three (!) times. And also by default, Debian and Ubuntu aren’t too great at log hygiene. While
/var/log/ is scrubbed by logrotate(8),
nothing helps avoid the journal from growing unboundedly. So I quickly make the following change:
pim@ublog:~$ cat << EOF | sudo tee /etc/systemd/journald.conf [Journal] SystemMaxUse=500M ForwardToSyslog=no EOF pim@ublog:~$ sudo systemctl restart systemd-journald
Paperclip and ImageMagick
I noticed while tailing the journal
journalctl -f that lots of incoming media gets first spooled to /tmp and then run through a conversion
step to ensure the media is of the right format/aspect ratio. Mastodon calls a library called
paperclip which in turn uses file(1) and
identify(1) to determine the type of file, and based on the answer for images runs convert(1) or ffmpeg(1) to munge it into the shape it
wants. I suspect that this will cause a fair bit of I/O in
/tmp so something to keep in mind, is to either lazily turn that mountpoint
tmpfs (which is in general frowned upon), or to change the paperclip library to use a user-defined filesystem like
and make that a memory backed filesystem instead. The log signature in case you’re curious:
Nov 20 21:02:10 ublog bundle: Command :: file -b --mime '/tmp/a22ab94adb939b0eb3c224bb9046c9cf20221123-408189-s0rsty.jpg' Nov 20 21:02:10 ublog bundle: Command :: identify -format %m '/tmp/6205b887c6c337b1a72ae2a7ccb359c920221123-408189-e9jul1.jpg' Nov 20 21:02:10 ublog bundle: Command :: convert '/tmp/6205b887c6c337b1a72ae2a7ccb359c920221123-408189-e9jul1.jpg' -auto-orient -resize "400x400>" -coalesce '/tmp/8ce2976b99d4b5e861e6c988459ee20c20221123-408189-1p5gg4' Nov 20 21:02:10 ublog bundle: Command :: convert '/tmp/8ce2976b99d4b5e861e6c988459ee20c20221123-408189-1p5gg4' -depth 8 RGB:-
I will put a pin in this until it becomes a bottleneck, but larger server admins may have thought about this before, and if so, let me know what you came up with!
There’s a little bit of a timebomb here, unfortunately. Following [Full-text search] docs, the install and integration is super easy. But, in an upcoming release, Elasticsearch is going to force authentication by default, even though in the current version they are still tolerant of non-secured instances, those will break in the future. So I’m going to get ahead of that and create my instance with the minimally required security setup in mind [ref]:
pim@ublog:~$ cat << EOF | sudo tee -a /etc/elasticsearch/elasticsearch.yml xpack.security.enabled: true discovery.type: single-node EOF pim@ublog:~$ PASS=$(openssl rand -base64 12) pim@ublog:~$ /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive (use this $PASS for the 'elastic' user) pim@ublog:~$ cat << EOF | sudo tee -a ~mastodon/live/.env.production ES_USER=elastic ES_PASS=$PASS EOF pim@ublog:~$ sudo systemctl restart mastodon-streaming mastodon-web mastodon-sidekiq
Elasticsearch is a memory hog, which is not that strange considering its job is to supply full text retrieval in a large amount of documents and data at high performance. It’ll by default grab roughly half of the machine’s memory, which it really doesn’t need for now. So, I’ll give it a little bit of a smaller playground to expand into, by limiting it’s heap to 2 GB to get us started:
pim@ublog:~$ cat << EOF | sudo tee /etc/elasticsearch/jvm.options.d/memory.options -Xms2048M -Xmx2048M EOF pim@ublog:~$ sudo systemctl restart elasticsearch
E-mail can be quite tricky to get right. At IPng we’ve been running mailservers for a while now, and we’re reasonably good at delivering
mail even to the most hard-line providers (looking at you, GMX and Google). We use relays from a previous project of mine called
[PaPHosting], which you can clearly see comes from the Dark Ages when the Internet was still easy. These days, our
mailservers run a combination of STS-MTA, TLS certs from Lets Encrypt, DMARC, and SPF. So our outbound mail is simply using OpenBSD’s
smtpd(8), and it forwards to the remote relay pool of five servers using authentication, but only after rewriting the envelope to always
@ublog.tech and match the e-mail sender (which allows for strict SPF):
pim@ublog:~$ cat /etc/smtpd.conf table aliases file:/etc/aliases table secrets file:/etc/mail/secrets listen on localhost action "local_mail" mbox alias <aliases> action "outbound" relay host "smtp+tls://email@example.com" auth <secrets> \ mail-from "@ublog.tech" match from local for local action "local_mail" match from local for any action "outbound"
Inbound mail to the
@ublog.tech domain is also handled by the paphosting servers, which forward them all to our respective inboxes.
Rules are important. I didn’t give this as much thought, but I did assert some ground rules. Even though I do believe in [Postel’s Robustness Principle] (Be liberal in what you accept, and conservative in what you send.), I generally tend to believe that computers lose their temper less often than humans, so I started off with:
- Behavioral Tenets: Use welcoming and inclusive language, be respectful of differing viewpoints and experiences, gracefully accept constructive criticism, focus on what is best for the community, show empathy towards other community members. Be kind to each other, and yourself.
- Unacceptable behavior: Use of sexualized language or imagery, unsolicited romantic attention, trolling, derogatory comments, personal or political attacks, doxxing are strictly prohibited. Use conduct considered inappropriate for a professional setting.
I also read an entertaining (likely insider-joke) post from [@firstname.lastname@example.org], in which she was asking about the internet explorer favicon on her instance, so I couldn’t resist but replace the mastodon favicon with the IPng Networks one. Vanity matters.
Now that the server is up, and I have a small amount of users (mostly folks I know from the tech industry), I took some time to explore both the Fediverse, reach out to friends old and new, participate in a few random discussions, fiddle with the iOS apps (and in the end, settled on Toot! with a runner up of Metatext), and generally had an amazing time on Mastodon these last few days.
Now, I think I’m ready to further productionize the experience. My next article will cover monitoring - a vital aspect of any serious project. I’ll go over Prometheus, Grafana, Alertmanager and how to get the most signal out of a running Mastodon instance. Stay tuned!
If you’re looking for a home, feel free to sign up at https://ublog.tech/ as I’m sure that having a bit more load / traffic on this instance will allow me to learn (and in turn, to share with others)!