New server coming up

After the recent server failure, and the restore problems involved with this, I decided to upgrade to new server with some extra RAM and SSD space.

This took a bit longer to be provided due to some CPU shortages, but I now have access to the machine and have started to set it up. I hope to have it ready for testing by tomorrow, and it should be ready to fully take over on the next weekend.

The key figures are that the new machine has 128GB instead of 64GB, and 7TB of SSD space instead of the previous 2TB SSD plus 3TB HDD.

The extra RAM should help with the “out of memory” errors during rendering which happen every once in a while with large map areas.

The extra SSD space now allows to have the full database on fast storage, and has enough headroom for adding a few imposm based stiles, which was not possible so far.

Outage and new server

After a recent outage I had to restore the server from a backup. While it is mostly running again, some minor services as e.g. the translation backend, are still missing.

As I was planning to switch to a new server soon anyway I did not bother to look into all the minor things as of yet after having main services restored.

The new server should be available any day now, and I will take my time setting it up properly in parallel to the current server still running, also improving tthe setup, and especially the backup/restore plan wile I’m on it.

The new server should be ready for prime time in a week or two from now, with more RAM and more NVMe SSD disk space available.

Downtime notice for new DB import

Database bloat has again taken its toll again, and I’m running out of SSD disk space for the main databases soon.

So I’m going to take the server down over the next weekend, staring late Friday Feb 20th, around 20:00UTC. Assuming that everything goes well everything should be back up and running on Sunday afternoon.

Continue reading “Downtime notice for new DB import”

Import lag

The database is currently lagging behind, it is catching up now, but will take a few more days to be fully up to date again.

Problem was that I had temporarily stopped minutely diff import for some maintenance about three weeks ago, and then forgot to turn them on again.

Problem #2 was that my monitoring only checked for failed diff imports, but not for them not running at all.

Experimental: Larger areas supported

So far this service only supported map areas up to about 40×40 kilometers. I for now extended this to 300×300 kilometers.

This is experimental for now, and I may end up rolling this back, at least partially.

The main problem with this change, aside of longer render times, is that this change may lead to out-of-memory errors while processing render requests. This especially seems to affect the compressed SVG output. Problem is that failure to render one output format will make the whole render job fail, even if some other formats could have been rendered just fine.

I had experimented with raising the limit to 1000×1000 kilometers, and tried to render a map of all of Germany with that setting, but with that SVG output failed with out-of-memory errors regardless of paper size, even though PDF output, which also uses vector format, could be created just fine.

Unfortunately an out-of-memory error in the rendering library is not something that can just easily be caught and recovered from using Python exception handlers, so I need to come up with more complicated ways to deal with this.

Rendering the northern part of Germany within the 300×300 kilometer limit works fine, so I’ll keep the 300×300 kilometer setting for now. I have not tested such large maps with other styles than the default OSM one though, so there may still be error situations I’m not aware of.

I will watch the number of out-of-memory failures, and the average render time and render wait queue size closely for the next days or weeks, and may return to the smaller 40×40 kilometer limit if any of these monitor values get too high.

How not to test changes …

Yesterday I found out the hard way that the neighborhood POI frontend did no longer work, and must have been broken for quite a while already.

Why didn’t I spot this in testing? Well …

  • The problem was a typo in the OCitysMap renderer backend code, not in the frontend
  • I tested the change back then locally
  • But due to another typo in the test setup the local config for the POI frontend was copied to the wrong folder, and so not actually used
  • So the frontend fell back to default settings for the rendering host, and that happened to be the public, not the local server
  • Meaning that when testing the change I actually tested the frontend, which didn’t really change, against the not yet changed public render backend
  • As test results looked good I pushed the changes, pulled them on the public server, but didn’t test there once more
  • So the typo in the renderer went unnoticed as I actually didn’t test the local instance of that, and after the push/pull the public instance was broken, too 🙁

Renderer code and local test setup are fixed now, so something like this should hopefully not happen again.

Now I need to work on making the alternative frontend send email notifications about rendering errors to me, like the main frontend already does, so that a failure like this can’t go unnoticed for this long …