(Failing to build) Templates in Xen-Orchestra

Let’s (fail to) build some golden images
linux
virtualization
packer
xcp-ng
Published

November 27, 2023

Up front warning

After an absurd amount of hacking I did not find a good way to automate template creation. You can read this to follow my descent into madness and/or to see if I already tried and failed at the approach you were considering trying. Let this be a warning. I’ll make a subsequent post about how I ended up doing this.

Introduction

When I was learning proxmox earlier this year I considered using packer. At the time it seemed like a lot of work, and I found some relatively simple scripts I could run to build templates for the operating systems I was interested in. Now I’m testing out xcp-ng and I need to create template images again. In the future if I find myself wanting to deploy any of those images to another environment and I’ve figured out some xcp-ng specific way of doing things I’ll have to relearn again. At this point it’s starting to make more sense to bite the bullet and figure out packer. Plus, I’ve been writing a fair bit of terraform at work so the hashicorp language makes more sense to me. That should help with the learning curve. This post will document my learn by doing attempts to figure out packer. It’s definitely not a how-to guide, more a reference for myself in the future. That said, it might be helpful to others who are considering learning packer to see what they’re in for.

Installing packer

My initial instinct is to add packer to my IaC devcontainer. However, as a simple learning exercise I’ll probably be building some docker containers, and I don’t feel like troubleshooting whether the errors I’m getting are from packer or something specific to docker-in-docker. So for now I’ll just install it right into my WSL install of Ubuntu on my workstation.

The docs have a pretty easy install guide so that goes fine, except for the part where it was silently prompting me for my sudo password and I thought it froze.

Beyond that I can run packer -v to make sure it’s loaded and determine that I’m running version 1.9.4. Cool, let’s build something

Getting sidetracked by XOA

I kept reading about how Xen had pre-packaged templates and even a full blown kubernetes deployment available in Xen, but I couldn’t see it in my UI. After some reading I realized it wasn’t included in the source built version I was running. I can deploy XOA free edition and get the templates, but I’ll still want the source built version to handle my backups and other stuff that’s not included in free tier. A little hacky, but fair enough since I’m not giving them any money. After signing up for an account I got this cool web based deploy link where I pointed it at a host, put in my password (after accepting self-signed certs) and it auto deployed XOA onto the host.

Taking a look at the templates, my options were fairly limited, and the kubernetes auto deploy seemed interesting, but I don’t think I’m ready for that yet. It’s cool that it exists, and I think if I built a cluster again I’d bootstrap something like this instead of going the docker route I did (unless xcp-lite actually works in which case I won’t need either).

I did want to test restoring my metadata backups, but unfortunately that feature is pro tier so I couldn’t do it. I was able to manually export my config and load it into XOA and have all my hosts show up, so that was nice. For now I’ll shut this VM down and go back to doing things with packer.

Make a template SR

One of the things I found annoying about proxmox was that even if my machines were clustered I had to create the same template on each machine. I think I can get around this with xcp, let’s find out. In my NFS share for XCP stuff I make a new folder called templates.

I then add that as a NFS SR for images on each node. It’s a little annoying that I have to replicate it, but I still think that’s better than worrying about pools at this point. I’ve been trying to use Chat-GPT to help with this sort of work and I asked if I could export SR configs across pools. It hallucinated that there was an export option, which sure would have been nice. Anyway, it’s only a few hosts and my browser remembers all the configs in the boxes so it only took a few minutes to set up all hosts.

Ubuntu template

The xenserver packer plugin repo has an example section that builds Ubuntu 20.04. That’s a bit old for my taste at this point, but let’s try and get it working and then adapt to it and other OS installs later.

I initially wrote a script that exported my variables for username and passwords in the form PKR_ENV_<variable> but for whatever reason it didn’t seem to like that and claimed that I hadn’t set those variables. I modified the script to export to json instead and that seemed to work. Sort of, I got a new error after that:

The iso_checksum_type must be specified.

The ISO checksum seems like it should be coming from a data source that pulls it from a page maintained by Ubuntu with hashes for their ISOs. Everything looks ok, and if I hard code in the release version to the URL I can get to a page that shows hashes.

I’m not sure at this point if something is going wrong with my variable interpolation or I missed some other step. Adding PACKER_LOG=1 in front of my build command got me a lot more verbose output, but nothing useful.

Running packer inspect instead of packer build showed me some outputs (ChatGPT hallucinated some other terrible ideas about how to get this stuff, I still have mixed feelings about working with it).

local.ubuntu_sha256: "[\n  \"b8f31413336b9393ad5d8ef0282717b2ab19f007df2e9ed5196c13d8f9153c8b\",\n]"

That output appears basically correct since the variable I’m applying it to looks like this iso_checksum = "sha256:${local.ubuntu_sha256.0}".

Ugggh, it’s an error in the examples. There’s now an iso_checksum and an iso_checksum_type. I should have looked more closely at the error message. Ok, modifying the spec to break those two parts up and we’re back in business. On to the next error:

xenserver-iso.ubuntu-2004: output will be in this color.

Build 'xenserver-iso.ubuntu-2004' errored after 5 milliseconds 434 microseconds: Post "https://xo.<sensitive>.net": dial tcp 192.168.10.10:443: connect: connection refused

Oh, I’m not supposed to point this at my xen-orchestra, it’s supposed to go against an xcp-ng host. This is going to be confusing for a while, I’m pretty sure terraform goes against xo. I guess I’ll find out.

This time we get building, but it hangs on Step: Wait for VM's IP to become known to us. Bringing up the console on the machine in XO it’s hung on a prompt for autoinstall that’s expecting user input.

I tried running it again so I could see the console output and this time (after at least half an hour) it seemed to work.

Upon closer inspection it only shows up in one of my pools (we’ll try booting later) and I have a bunch of stuff in the root of my xcp-ng storage location on my NAS, whereas the templates were all supposed to be in a subfolder. Did I mess up my storage location? Hmmm, looks like I did, or it didn’t like my subfolder. Let’s delete this template and try again. Ahhh, yeah after you put in the subfolder when defining a SR path you have to hit the little search icon beside it or it doesn’t actually add the path. Probably there as a validation step but it sure tripped me up. I’ll make it on the local and one other host, I would still like them to share. Ahhh, ok. Each pool is making a UID subfolder, so they won’t be sharing templates unless I pool them. I can probably at least create them in one pool and then replicate them to my others. For how often I will want to create or update templates I probably don’t need to overengineer this. I could even use one host to build all my initial images and then migrate them, but that seems like a pain.

Get stuck on templates

There were a bunch of empty templates along with the provisioned machine I created that I didn’t think I needed, so I deleted them. Now when I try and install Ubuntu with packer it fails. Which is weird, I didn’t realize I was using them. So I either have to figure out how to get them back, or how to have packer build without them. I can always restore a backup of my config, I’ve been meaning to do that anyway, but let’s see if I can restore them some other way first. Allegedly there’s a way to export/import them, but I can’t see anything.

Get stuck on metadata restore while getting stuck on templates

Trying a restore…. I can’t seem to restore. This is why you test these things I guess. What’s going on here? Maybe it’s something about the pool metadata? I think I should have broken those into two tasks, since it’s asking me to do a pool restore at the same time. Nope that’s not it. Looking at the error it’s trying to find the backup in the wrong path. If I follow the mount point on my XO server I can see the backup file, but it’s trying to load it from a weird subdirectory.

The error is

ENOENT: no such file or directory, open '/run/xo-server/mounts/3f885501-70f9-4219-8707-2a6515b0814e/opt/xo/xo-builds/xen-orchestra-202312021218/packages/xo-server/xo-config-backups/9a515773-8912-4677-9e37-9e187341ecb9/20231129T070000Z/data.json

And the actual path is

/run/xo-server/mounts/3f885501-70f9-4219-8707-2a6515b0814e/xo-config-backups/9a515773-8912-4677-9e37-9e187341ecb9/20231129T070000Z/data.json

I’m not sure why it added all that other stuff in.

Just for kicks let’s head over to my XOA install. Backups are a premium feature so I turn on my trial version. The restore runs fine from there, so it is some issue with the source build. Now what do I want to do with this information? I mean, VM backup and restore worked. It’s not great that this doesn’t, but is it the end of the world? I did that manual backup restore to XOA and it worked ok. I wonder if I can reverse that to get this back in shape? If that works I’ll just have to remind myself to do manual xen-orchestra backups before I start messing with things. Or rely on actual VM backups instead.

Back to templates

Having fired up xen in the XOA appliance and the original docker container I was using I can no longer see templates on my hosts when I create a new VM anywhere. So that’s clearly not a xen-orchestra thing, but an xcp-ng thing. Still figuring all that out. That at least helps me figure out where to go for docs and possible solutions.

From reading it seems like I might just have to reinstall. I do want to test a reinstall eventually, but not right now.

Maybe just do cloud images

I’m starting to feel like packer is just not the tool for me. On proxmox I just used cloud images as a template and then applied some settings in the cloud-init config. XCP-ng has good support for cloud init (allegedly) and it looks like I can save template configs for it at the XO level, so that would be reproducible across hosts. Let’s give it a try. This blog seems like a fairly straightforward example of what I want to do. Let’s give it a try.

Ok that worked great. I modified the cloud-init slightly and it installed the guest management utilities and everything. I think by just making a couple custom cloud-init configs I can have an easily built template system.

Try doing a shared SR for templates

When messing around with recreating SRs for the templates it kind of looked like I had the option to attach newly created SRs on hosts to the same path as existing ones I’d created. This forum post also makes it sound like I should be able to do that. Let’s give it a try.

To start I delete the old template SRs I created (after deleting the template I made) and clear them out from my NAS.

Next I create one on a single pool using the same config as before. Then I do the same on a second host. When I hit the search button on that folder the “storage usage” section fills in with an ID for the last one I created. I click the “Reattach SR” button, and get an error that it exists. So it doesn’t look like they can share. That’s ok. I can still create templates once and then copy them to any machines I want to provision them to. Not perfect but not terrible either.

Arch template (failed attempt)

The Ubuntu template was easy enough, and it’s handy to have an Ubuntu template available, but I generally prefer running Arch these days, so let’s build for that to start.

It doesn’t look like there is an OVA cloud image for Arch. There is a way to convert KVM but it feels like it might be more work than just making my own template.

As I’m doing this I’m realizing something is messed up about my ISO SRs. They should all be pointing to my NAS, which has a debian and an Ubuntu image on it. But on two hosts I see only ubuntu and some packer stuff, on another there’s an Arch ISO. None of which match up with what’s actually happening on my NAS. Ahhh, again, I didn’t actually add the path properly so there’s some overlap going on. That’s sorted now and I can actually see just the ISOs I was expecting. Apparently sharing ISOs across SRs is ok.

I can’t seem to create a VM though, probably because I borked my templates. I can’t seem to add a network or a disk when I go to create a VM. Something in the template setting must handle that in a way I can’t see. I think it’s reinstall time.

Reinstall

Let’s start with a host that’s not running XO. Not that I couldn’t get that back, but I’d rather not have to. The reinstall process is pretty straightforward. I didn’t have to re-enter any of my config, and the host came back up in its same pool, with all its SRs attached. I did need to apply patches again, but that’s to be expected. Once it came back up I could see all the default templates that came with the VM, and creating a new VM seemed to work better, I could see the network options and pick storage.

Migrate templates

Before I go further, let’s see if I can now migrate all these templates to my other hosts rather than reinstall. That didn’t seem to work, every copy got a no opaque ref found error. I think there must be something fancy about those.

Keep reinstalling

I was going to want to test this anyway, let’s do it now. I’ll bring down another node, this one with my XOA VM on it and to a reinstall there. Install went fine, the VM was still there and booted ok when it came back up. Nice! Ok, in theory I can bring down my last node running XO, reinstall, and power it back on without any migration. let’s test.

Came back up fine. That’s pretty nice.

Back to arch installing

Ok, now that templates are working again and I have my ISO SR sorted, let’s build an arch template. This blog suggests it won’t be that hard, do a basic Arch install, install yay and xen guest utils. I’ll probably add cloud-init stuff too.

The basic install went ok, but package downloading was super slow. I’m not sure if that’s a driver issue or something with my mirrors. I’ll have to do some more testing before I make a template of it.

As a start let’s try installing and running reflector. It’s probably a good practice to have that anyway. While I’m at it I’ll install ssh and iperf3. Being able to ssh in is definitely handy since it lets me paste commands. I still can’t seem to paste into the command console from xen-orchestra. Maybe I’ll look into that while reflector is running.

Get distracted trying to figure out how to paste into xen-orchestra console

Well this is discouraging

I did notice that there’s a little box above the console for copying, but it’s barely responsive when I type and doesn’t seem to let me paste. I guess the real solution is to set up SSH or a proper remote desktop ASAP.

Back to arch installing

Fix slow package downloads

So prior to running reflector I decided to try iperf3. Fortunately I’m pulling a solid gigabit between two machines on my network, so it’s not a weird driver thing, I probably just picked terrible mirrors in the installer. Let’s do reflector. Ok, after running reflector my package downloads are way faster. Might as well start and enable the systemd timer for it as well, I’d like that to happen somewhat regularly on all my installs.

Install yay

xen-guest-utilities aren’t in the base package manager, but they are in the AUR. So to get them I’ll need an AUR helper. I’ve generally been happy with yay and it’s still actively developed so why switch? I grabbed the pre-built binary from the releases section so I didn’t have to install a bunch of build tools onto my base template.

Install guest utilities

Looks like the package I want is called xe-guest-utilities-xcp-ng. Sounds right. I chose to remove the make dependencies after installation, since I’m trying to keep this install light. The install completed ok but XO didn’t detect it as having a management agent. When in doubt reboot. Nope. Ahh, here we go, in the notes from the AUR package someone is talking about an issue that’s preventing the xe-linux-distribution.service from starting. Once I start that service the management agent is detected and my IP shows up. Enable the service so it persists across reboots (realized I forgot to do that with ssh when I couldn’t get back in after a reboot.)

Install cloud-init

I think this is the last thing I want to install on this template. There are a few other things I basically always use, but to start at least I want this template to have just the absolute minimum of a VM so that I can customize it later. I’m even removing iperf3. I guess reflector is a bit of a cheat in that regard, but given how awful the mirrors I picked were maybe that’s ok. I install cloud-init and cloud-guest-utils. According to the wiki I need the latter if I want my disk to resize, which I certainly do. Ok, both are installed. I think we’re ready to turn this into a template.

Make a template

Based on this post I think all I have to do is run cloud-init clean and shutdown the VM before I turn it into a template. I wasn’t sure if I had to do that command as root or not, it didn’t fail on either so I ran both to be safe and then shut down. Back in XO I head over to the advanced tab for my VM and click “convert to template”. That appears to be done.

Try building from the template

If/when this works I’ll have to mess around with saving cloud init configs, but for now let’s just hack one together quickly. I go to create a new VM on the host that has my template (I’ll copy it to the others later if I’m happy with it). I pick the template from the list, give the VM the very creative name archtest1.

Here’s the dump of the user cloud config I tried. I left the network one alone:

#cloud-config
hostname: {name}%
ssh_authorized_keys:
  - <My public key spec>
packages:
  - vim
users:
   - default
system_info:
   default_user:
     name: ipreston
     lock_passwd: true
     gecos: arch Cloud User
     groups: [wheel, adm]
     sudo: ["ALL=(ALL) NOPASSWD:ALL"]
     shell: /bin/bash

I had to be careful to create the SR on the local disk for the host instead of my templates SR, which runs on my NAS, I’m sure that wouldn’t have been amazing for performance. I gave it a 20G disk so I could see if the expansion worked properly out of the box.

With that it was time to hit create and see how things went.

Well it started, so that’s great. But my password still worked, which it shouldn’t, and it pulled the exact same IP as the template, which suggests the MAC address hadn’t changed. That will be problematic in the future. Taking a closer look, my ssh key hasn’t been added, vim hasn’t been installed, and my partition hasn’t been expanded. I can see 20G of available space, but I’ve only still got the 10 allocated. No bueno.

Figure out why the template didn’t really template

At this point I’m getting dangerously close to going back to packer. Ok, re-reading the Arch wiki maybe I was supposed to enable some of those services before shutting down? I can’t find any examples specific to Arch online but this blog doesn’t show anything like that for CentOS. From a bit more careful reading of the Arch wiki it does look like I want cloud-init.service and cloud-final.service enabled. Let’s give that a shot.

I create a new VM from my “template” so I at least don’t have to redo all that work. I enable those services above, run cloud-init clean as my user and root again to be safe and poweroff. Back to the advanced tab on the management interface to create a template.

Ok, new VM based off this updated template. The create button spins for a while after I hit it. That’s either good because cloud-init is doing things upon creation like installing vim, or bad because something is screwed up and stuck in a boot loop. Let’s wait and find out.

Ok, it pulled the same IP address, but I also can’t login with my password, so maybe something happened? Let’s try to ssh in with my key. Oooh, I get a big warning that the remote host key has changed, that’s actually good! It might mean that other stuff is different. Let’s remove that old key and try again. Well I definitely can’t connect via password, but it’s also rejecting my key, so maybe something went wrong with my cloud config? Kind of hard to test if I can’t actually get into the host though. Weirdly this feels like progress. I’ll stop and remove this machine and try creating another one with the easier way of passing just my key in, maybe it was just something dumb about how I did the cloud config file. I’ll have to sort that out eventually, but one thing at a time.

While I’m messing around, let’s make a note of what MAC this machine is given for future attempts to compare against: a6:81:06:16:c3:b5.

Ok, still can’t get in with the key I added, so something else is wrong here.

Let’s create a new VM from the first template, I can’t make anything from the one I just tried since I can’t boot back into it. For reference it gets a MAC of 06:31:62:32:8b:59

From here I want to take a look at /etc/cloud/cloud.cfg to see if there’s stuff I should be adding to make it pick up my data sources or whatever else I want it to do.

Ok, looking in this file there’s no datasource_list entry. Per the XO docs I need the opencloud type to send in my config. Let’s try adding an etry with that.

datasource_list: [ NoCloud, ConfigDrive, OpenNebula, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, Ec2, CloudStack, None ]

Ok, with that added I think I can enable those two services, run clean and shut down again. Back in the menu convert it to a template, and try to spin up a machine from it. Again I just give it my key for now and set the disk to 20G so we can make sure that works (assuming I can actually get in this time).

First good sign, this one has a new MAC (5a:62:06:7d:c2:e3) and the guest tools seem to be working. I think because I didn’t set a default name here the user should be arch, let’s try and ssh in with that. I’m in! My disk is the full 20G too, so that worked as well! Awesome!

More fun with templates

At this point I am tempted to skip ahead to doing terraform deploys and triggering ansible upon deployment and other fun stuff like that. Before I get too crazy though, let’s clean up and do a bit more testing of what I’ve done. After that I think it might be worth going back to packer and trying again. What I did for arch was pretty manual, and even though I should really only have to have done it that one time, I’d like to have it in code just to be safe, and to possibly extend it.

Cleanup

It’s important to tidy up after yourself before moving on to the next thing:

  • Stop and delete the test VM created from the template.
  • Delete the broken Arch templates (the one that didn’t have cloud-init enabled and the one it was broken on)
  • Rename archtemplate2 to something more appropriate.

Try on different hosts

I could just create all my VMs on one host and then migrate them to their destination, but why? Let’s copy that template to another host and try provisioning there. The copy isn’t super fast, but it’s almost certainly faster than re-building the template a second time, even if I had automation around that.

While I’m at this, let’s try creating it with the cloud config I wrote up above this time. If I’ve got that right I should ssh as my usual username instead of arch, and I updated it slightly to install iperf3, since I added vim to the base image to edit the cloud config. There’s basically no machine I wouldn’t want vim on so I’m ok with that.

Ok, that didn’t work. Let’s try again with just the ssh key and I’ll figure out my cloud config later.

Doing it with just my basic ssh key passthrough worked fine. So there’s something weird about what I’m doing with my cloud config.

Figure out actual cloud-init config

To be honest I’m not sure how much I’m going to end up caring about this. I don’t really care about the default username, it would be slightly convenient to have it be ipreston instead of arch or whatever. If I really cared I could make a new template and set that as the default in /etc/cloud/cloud.cfg as well. I guess I might want to specify static IPs, although generally I prefer static assignment of DHCP leases. Maybe if I’m making some hosts that need a management network that they just talk on that I’m handling outside of my router. For package installation and other post install stuff, I think I’d generally rather go with ansible. Maybe there’s some one off things like signing my host keys with my CA that would fit there. Generally I think I just want to know how this works because it’s there and I don’t like not being able to understand it.

Rather than starting from the XO template, let’s try creating one based on what’s in the Arch wiki. From what I’ve read even though the cloud-init spec is supposed to be universal there’s a fair bit of bespoke stuff. I don’t know how valid that is, but this seems like a place to start from.

Ok, with that format I can set the default username and have it authorize my ssh key. If I’ve got that set up off the hop then I can do the rest with ansible. It might still be nicer to do some things with cloud-init, but I’ve got a template saved that I can use for at least arch that will get me a bare bones machine. That’s good enough for now.

Try packer again

Now that I know I can create templates, and I’ve fixed having those default templates available to me, let’s see if I can build that Ubuntu template yet.

Way back near the beginning of this post I have was having flaky issues with completing the template build, that I then exacerbated by not specifying my SRs correctly and deleting the base templates that were required.

I’ve lost track of what exactly I was stuck on before, I think it was just that I broke a bunch of unrelated things after that last flaky attempt. Let’s just try running the packer install again and see what happens:

xenserver-iso.ubuntu-2004: Unable to get SR: Found more than one SR with the name 'templates'. The name must be unique

I guess since they’re not actually shared I might as well name them accordingly.

After fixing that and waiting 22 minutes my build finished! It didn’t delete the VM it built, but I had that as a setting, and it does seem to have created a new template.

Let’s clean up and try creating a VM from that template. I remove the built VM, then start a new VM from the packer template. I’ll use the custom cloud config that worked on Arch, and give it a 15G disk to see if resizing worked. I think there’s going to be other stuff I need to do to get this working the way I want, but we can test at least the basics here. Well, good start, it booted. And the management agent was detected. It didn’t update the hostname as I’d have expected from my cloud config so that’s not a great sign. I can log in with the testuser username and ubuntu password that was hard coded into packer and should have been overwritten. So I guess the cloud-init part didn’t work at all. That’s kind of the whole point of having templates so that’s obviously disappointing.

Interestingly, the cloud-init service appears to be enabled, and the default cloud config should have made an ubuntu user with a disabled password. I wonder if that one works with my key? Or maybe my default user will. If it’s just a matter of disabling this test user that’s one thing. Nope, no luck with any of those. Just for kicks I check /etc/passwd and confirm that’s the only user that exists (besides the system ones).

Taking a closer look at /etc/cloud/cloud.cfg it doesn’t look like I have that datasources_list array that I needed in Arch to make things work. So presumably I need some way to add that line to the file as part of template creation.

From the VM I created from the template I run cloud-id as per the docs to confirm (or at least support the theory) that the issue is it’s not seeing my no-cloud config format. It just returns none.

From reading up a bit on provisioners it looks like I should be able to add some steps to my build block to do the things I want.

There might be better ways than just calling shell commands though, let’s do a bit of reading and look into that first in the provisioners docs.

From a quick read there’s basically file and shell provisioners. I think if I want to do fancier things the idea is to write a shell script and copy it into the build with the file provisioner and then run it with the shell provisioner. That seems reasonable to me.

Let’s just try a little provisioner block and see what happens:

  provisioner "shell" {
    inline = [
      "echo provisioning all the things",
      "echo 'datasource_list: [ NoCloud, ConfigDrive, OpenNebula, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, Ec2, CloudStack, None ]' >> /etc/cloud/cloud.cfg",
    ]
  }

It just dumps it to the end of the file, we’re not being fancy here, but let’s see if it even works.

==> xenserver-iso.ubuntu-2004: Provisioning with shell script: /tmp/packer-shell475973010
==> xenserver-iso.ubuntu-2004: /tmp/script_510.sh: 3: cannot create /etc/cloud/cloud.cfg: Permission denied
    xenserver-iso.ubuntu-2004: provisioning all the things

Maybe it’s as simple as needing to put sudo on it. Let’s give that a shot before trying anything super fancy. The feedback loop on this is real slow though so if that doesn’t work I’m going to have to figure out a smarter way to do things.

I’m back to having issues being prompted by the autoinstaller. I’ve put in the change in this issue, but that hasn’t sorted it. There’s also this issue let’s see if I can figure out what to do from it. Ok, it’s still failing arbitrarily. Also my sudo on the cloud config didn’t fix it. Probably because that user isn’t added to the sudo group. Back in my user-data file let’s add some late commands to add that user to sudo, and also just do the cloud config line addition there.

Ok, you can’t add the user to sudo with late commands because it doesn’t exist yet. Let’s at least see if my cloud config update worked. The template built at least. Let’s see if I can create a VM with my cloud config.

It creates but it doesn’t apply my cloud config.

That’s not great. I think I’ll probably create Ubuntu images from cloud templates anyway, so I’m not sure how much time I want to invest in this particular issue. I also noticed that the latest release of the packer plugin allows building templates off other templates, so I could extend cloud images if I had to if that worked.

Try Arch with packer

So I know how to build an Ubuntu image with packer (but it doesn’t have the settings I want) and I know how to build a template of Arch (but manually and without packer). Let’s see if I can combine that knowledge to build an Arch template with Packer. If I get that I’ll have pretty quick and repeatable ways of building templates for the main OSs I want templates of. I should be able to extend one or the other pattern to Debian as well. Let’s see how I can do with Arch. There’s not a ton of content for building Arch images with packer, but I did find this repo. I’m not sure how closely I’m going to follow it since it’s a bunch of bash strung together. I think I’d like to try automating archinstall since that comes packaged with the installer now. If I can get my json together in the format it wants and copied onto the template machine it should be easy to do.

As a start I take the Ubuntu template and update it to point to Arch ISOs. I pick a random mirror in Canada for download. Normally I’d do torrents as they’re much faster, but I don’t feel like telling packer how to do torrents.

That’s going to take a while, so while it’s going let’s try and get a json template ready.

Ughh, my template installed version has no network connectivity. How am I even going to get the json out of it?

Give it a shot with cloud images

While I was messing around with this, the packer plugin I’m using got a new release that claims to fix the XVA template builder. At least for Ubuntu this might mean that I can automate the building of cloud images to include guest-utils etc. For Arch I might still be hosed though, depends on if I can properly convert images to a raw format.

XO-CLI problems

To fully automate this there’s a few steps I have to do in advance outside of packer anyway, and honestly if I get them working I might be close enough that I just call it there. At this point I need node and some packages to npm install so we’re going to take a detour on a detour (on a…) and set up a devcontainer for this. Eventually if I get this going the way I like I’ll add it into my IaC devcontainer, but for now we’ll just do the basics. Packer install into the devcontainer goes ok. Figuring out the right way to get npm installed is always a bit of a pain, I just don’t use it enough to remember how I did it last time. Finally got xo-cli installed, but for the life of me I can’t figure out xo-ova-upload, which was going to be how I did the Ubuntu images. I guess we’ll try and be consistent on qcow2 format images for everything.

After a lot of messing around with my dockerfile I have a devcontainer with xo-cli, packer, and qemu-utils installed.

First I download the qcow2 version of Ubuntu 20.04LTS, and then use qemu-img to convert it to vhd format.

I log into my XO vm with xo-cli, now I have to figure out how to import this VHD, attach it to a VM, and make it a template. Easy, right? Upon closer inspection the xo-cli does not have an option to import a disk.

Start fresh

Ok, we’re just going to solve this piece by piece, systematically. No more jumping around, we’re going to solve things one at a time (he said with great hubris before almost certainly heading down another rabbit hole). I’m going to focus on Ubuntu since I almost had that working before.

Do a re-run, it takes half an hour but finishes. Let’s make a machine.

We’re going to start with just the ssh key option for cloud config. As usual give it 20GB to make sure it’s resizing disks. Ok, it created but it’s still got the testuser user, the disk didn’t resize, and basically nothing about cloud init worked. Let’s see if I can upload the xva and do some post install stuff to fix that.

First hurdle, no cloud-init

Packer keeps an XVA of the image locally, so I upload that to XO. Ok, that becomes a template by default so I’m not sure if that’s what I want. Let’s make a new VM from it and try messing around there. Make a new VM, don’t bother adding cloud configs or resizing disk since we know it won’t work.

From within the booted machine I run cloud-id to figure out what cloud it thinks it’s in. I get back none where I want NoCloud. Taking a look at the cloud-init docs I run sudo DEBUG_LEVEL=2 DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force to see what all I can see. There’s a data source list in there that includes NoCloud so I guess my idea from earlier that I had to force that into it somehow was a dead end anyway. The fact that it’s not detecting a cloud source at this stage might just be because I’m booting it as a regular VM though.

Try manual

At this point I am building a template, but it’s just not accepting cloud init configs, at least it doesn’t seem to be. Let’s try manually building an image and see if I can get it set up with cloud init, maybe that will give me some clues about doing it properly with packer.

I’ve already got the ISO uploaded to my storage repository thanks to packer, so I create a new VM with that ISO attached.

  • select my language

  • have it update the installer

  • standard keyboard layout

  • pick just Ubuntu server without third party drivers

  • standard dhcp

  • no proxy

  • default mirror

  • custom storage layout, have to make sure the main partition is at the end so it can grow

  • just make one flat ext4 partition on the image

  • Make a user testuser and give them the password ubuntu (will have to make sure this gets wiped)

  • skip Ubuntu pro

  • Install ssh server, don’t import keys, allow password auth

  • No featured server snaps

  • Hit reboot

  • have trouble unmounting the CD so unmount it from XO and then reboot

  • reboot - cloud init ran, although there was no config

  • sudo apt update && sudo apt upgrade -y

  • sudo apt install xe-guest-utilities -y guest agent is detected on host and I can see network

  • sudo apt install cloud-initramfs-growroot -y we’ll see if this works after template creation

  • systemctl status cloud-init it’s running

  • sudo cloud-init clean remove the cache, running without sudo fails

  • sudo poweroff

  • Back in XO convert to template

  • Create a new VM from the template, use my cloudconfig that I had working with Arch

  • Give it 15G of disk to see if it will grow

It partially worked. It got a new hostname, and the disk grew, but the testuser user was still there. I could ssh in as the new user I made though with my ssh key. So that’s pretty good. Maybe there’s something I can add to the cloud config to remove the test user.

According to my friend ChatGPT, I can add this line to my cloud-init to clean it out:

runcmd:
  - [userdel, -r, olduser]

Feels like there should be a better way to do this, but let’s at least try it first. It works! Ok, that’s pretty neat. From a little more reading around it seems like the trick is to either delete that user or set their shell to nologin or something. I think deleting is fine for my purposes, as long as I’m consistent with that cloud config.

Run packer again

There’s really not a lot I’ve done differently (that I can tell) between my packer auto install and the manual one. Maybe running cloud-init clean was required? I could see packer not automagically doing that and it causing a problem. To do that properly I needed sudo so I have to try and modify my autoinstall block to give testuser passwordless sudo, and then put a shell block at the end of the provisionder to run sudo cloud-init clean. It’s worth a shot.

Got most of the way through but it didn’t like my sudo command. After some looking around I found another packer script that had sudo -S -E bash in it and that seemed to work so let’s give it another run with that. Nope, failed on the password prompt

Ok, I found this post that has a slightly different setup. Let’s give it a shot. This is definitely getting ridiculous. I got the manual prompt during build this time for the first time in a while. I just typed it in from the console to continue, not sure if it’s something I changed or just a coincidence. It’s showed up and disappeared without any code changes before.

Well, something I did this time made it so that I couldn’t ssh in, which made the installer hang.

Give up

I’ve sunk so much time into this it’s ridiculous. I can just build a few templates, document what I did and be done with it. I guess I learned a bit about cloud-init so that’s good? Wow.