Optimizing Postgres for Sal

Over time, you may notice your Sal install getting slower and slower - this will happen faster the more devices you have checking in. You may even see rediciulous amounts of disk space being used - maybe even 1Gb per hour. This can all be solved by tweaking some simple matinenance settings on your Postgres server.

Background

Before we crack on with how to stop this from happening, it will be useful to know how Postgres handles deleted data.

Take the following table (this is a represenation of the facts table in Sal):

id | machine_id | fact_name | fact_data
---------------------------------------
01 | 01 | os_vers | 10.13.6
02 | 02 | os_vers | 10.13.6
03 | 01 | memory | 16Gb
04 | 02 | memory | 4Gb

When a device checks into Sal, rather than asking the database what facts are stored for the machine, iterating over each one, working out which ones have values that need updating, working out which ones are missing, and working out which ones need to be removed, Sal instructs tha database to delete all of the facts for that device, and then to save the new ones. What could potentially be 1000 operations becomes two, which is much faster.

You would expect Postgres to delete the rows out of the database at this point. Unfortunately that isn’t what happens. What actually happens is Postgres marks the row as able to be deleted. There are various good reasons for this outlined in the documentation which I won’t go into here, but when an application like Sal is updating and deleting data constantly, the disk usage can skyrocket.

after machine_id 01 has checked in
id | machine_id | fact_name | fact_data
---------------------------------------
XX | XX | XXXXXXX | XXXXXXX
02 | 02 | os_vers | 10.13.6
XX | XX | XXXXXX | XXXXXXX
04 | 02 | memory | 4Gb
05 | 01 | os_vers | 10.13.6
06 | 01 | memory | 16Gb

As time goes on, these empty tuples will mount up. This is where the database’s maintenance tasks come in. They are supposed to come along and vaccuum the tables, removing these dead tuples.

So what can we do?

But unfortunately the defaults are basically useless. I am not going to go in depth about why I chose the following settings - I learned a lot from this post and adjusted their recommendations to meet our needs. My Postgres server is Amazon’s RDS, so the settings are entered in the Parameter Group for the database. If you are running a bare metal install, you will be editing the Postgres configuration. I have added a few notes about why we chose the value we did next to the setting. Our general goal was to have maintenance performed more frequently, so it would take less time as it will have less work to do during each run, and to give the maintenance worker as much resources as possible so it would complete as quickly as possible.

autovacuum_analyze_scale_factor = 0.01
# This means the 1% of the table needs to change to trigger autovacuum.
autovacuum_max_workers = 1
# The default is 3. We set this to 1 to allow maximum resources for each worker, so it can complete it's work quickly and move onto the next table.
autovacuum_naptime = 30
# The delay between autovacuum runs in seconds. This is half the default - we want autovacuum to run as often as possible.
autovacuum_vacuum_cost_limit = 10000
# The 'cost' of autovacuuming is calculated using several factors (see the article linked for a good explanation) - we want autovacuum to happen as much as possible, so this is high.
autovacuum_vacuum_scale_factor = 0.1
# % of dead tuples to tolerate before triggering an autovacuum
maintenance_work_mem = 10485760
# The amount of memory to assign to mantinenance in Kb. We have assigned ~10Gb, as we have lots of memory on our RDS instance and can spare it. It should be set to the maximum amount of memory you can spare, as the maintenance will run much quicker if it can load more of the table into memory rather than having to read it from disk every time.

Conference Talks (Summer 2018 Edition)

It’s been three long months since I gave a talk with Brett, my lovely coworker at MacAd.UK, so it’s time to give some talks on the side of the pond which I currently reside.

Firstly I will be at MacDevOps:YVR on June 7th - 8th, where I will be joined by fellow beer snob Wes Whetstone where we will be talking about Crypt and probably talking about beer in the bar afterwards.

The next stop on my summer tour of places that aren’t the Bay Area will be PSU MacAdmins on July 10th - 13th. I’m speaking on July 11th at 3:15PM in the snappily named room “TBA2”, where I will be peering into my completely made up crystal ball and will be looking at where managing Apple devices is going and how that will affect our roles at Mac Admins.

I’m looking forward to seeing you all at one or both of these great conferences, so you can all tell me I’m wrong in the bar ;)

If you haven’t got your tickets yet, here is a 20% discount link for MDO:YVR and if you register by May 15th you can get $200 off your tickets for PSU.

Google Chrome update notifications with Yo

Web browsers are critical to pretty much any organization. For many people, the browser is everything, right down to their email client. Since the browser is probably the most used piece of software, and users are putting all kinds of private information into it, it’s vital browsers are kept patched.

Fortunately our default browser is Google Chrome, and Chrome is really good at keeping itself updated. Unfortunately it completely sucks at letting the user know that there is an update to install. I mean really, we’re just going to leave it at three tiny lines changing from grey to orange?

Useless.

So what are our options?

Like most people, we used our software deployment tool to patch it initially. In our case it was Munki. So the process was we merge in the update into our Munki repo, roughly an hour rolls by and Managed Software Center pops up and asks the user to quit Chrome to update it. All done, right?

Well, not quite. We noticed high numbers of pending updates sitting there in Sal. So your intrepid author took a trip to the help desk to have a listen in on some of the users.

Turns out people are really protective about their tabs. It’s the modern equivalent of not wanting to restart because they will “lose their windows”.

If a user by some random chance find their way to finding Google’s built in update button, Chrome will restart gracefully and preserve their tabs. So we set about working out how we could do this ourselves.

Won’t someone just think of the tabs?

chrome://restart has been around for a while, but for obvious reasons doesn’t work anywhere outside of typing it into Chrome’s location bar, which isn’t exactly user friendly.

After various attempts to trigger this Mike Lynn mentioned on MacAdmins Slack that they had found a way to do it - and it wasn’t pretty, but it worked.

It involved (shudder) AppleScript.

So, we had a method to restart Chrome and keep our user’s tabs safe. We just needed a method to let our users know about it.

Which version is running?

When Chrome’s auto update runs, they actually replace the app bundle from underneath the user. It took me a while (and some help on Slack from Tim Sutton) to work out what was going on. Google places a copy of the binary that is named the same as the version. This means that we can have multiple copies of the app in the same place.

ls -la /Applications/Google\ Chrome.app/Contents/Versions/
total 0
drwxr-xr-x@ 4 root wheel 128B Mar 13 12:43 .
drwxr-xr-x@ 9 root wheel 288B Mar 12 18:30 ..
drwxr-xr-x 5 root wheel 160B Mar 5 23:57 65.0.3325.146
drwxr-xr-x 5 root wheel 160B Mar 12 17:45 65.0.3325.162

Now to work out if there is an update to install, we simply need to read the Info.plist in the app bundle (the version that should be there) and compare it with the version that is actually running. If the version in the Info.plist is newer than the version running, the user has an update to perform.

Yo!

I’m a big fan of Shea Craig’s Yo. We have our own version that we have branded with our company logo so users know the notification is coming from us - we’ve used it in the past to let users know our anti-malware tool has cleaned something up, or that they are going to need to update their operating system soon. It’s a nice way of giving the user information without getting in their face.

I have packaged up a generalized version of the script and put it on Github. This script will also only notify the user once for each version so we don’t spam them.

MacAD.UK 2018: Curing operating system blindness

Thanks to those who came to our talk yesterday (if you did come, I don’t blame you, the other talk was much better). If you feel so inclined, here are the slides.

Custom DEP Packages

I’m sure everyone who didn’t have an MDM a few weeks ago is scrambling to get one set up - I’m not going to go into anything about MDM, since it really isn’t that interesting. They install profiles and packages - all very unexciting.

This article will take you through some of the decisions we made when developing our DEP enrollment package.

First attempt

If you are of the open source management tool persuasion, chances are that like me, you are very happy with what you have already and see MDM merely as a method for delivering those tools. Before we considered MDM, our deployment workflow was essentially:

  • Imagr lays down a base image
  • Imagr installs Google’s Plan B
  • Plan B install Puppet
  • Puppet performs the configuration
  • As part of that configuration, Puppet installs Munki
  • Munki installs the software

So on the face of it, it looked pretty simple for us to use our existing Plan B package with InstallApplication via an MDM.

DEPNotify

DEPNotify is a great tool by Joel Rennich - you can pass in various commands and it will let your users know what is going on. So we would open up DEPNotify and then kick off our Plan B installation. Which could sit there for 10 minutes without letting the user know what was going on other than “something is happening”. Whilst this obviously wasn’t a great experience for our users, it got the job done.

First optimization

Rather than make our users sit there and twiddle their thumbs whilst their computer sorted it’s life out, stopped and though about what our users needed to do first off. From our perspective, we really wanted the computer encrypyed before they did anything, and we needed them to get going with our SSO solution and change their password, set up 2FA etc. So this boiled down to two basic requirements:

  • Install Chrome - this is where the majority of ‘IT Time’ is spent during onboarding, so there was no need to wait for Munki to finally put it there.
  • Install and configure Crypt - let’s get the disruptive logout out of the way and let the user use their computer undisturbed.

    Read more →