I open-sourced the tool I use to run my Canton validator, would love some help

So, a bit of backstory. I run a Canton validator and for months I was doing everything through a pile of shell scripts and ssh sessions. Setting up the host, getting the Splice stack up, Keycloak, certs, then babysitting the thing. At some point I got fed up and started building a small web console to do it for me. It grew enough that I figured other people might find it useful, so I cleaned it up and put it on GitHub under MIT:

https://github.com/askardex/nodepilot

Right now it can install a validator from scratch on a remote box over SSH using Docker Compose, or push it to Kubernetes with Helm if that’s more your thing. It handles the DevNet/TestNet/MainNet switching so you’re not editing config files by hand, does the DNS and Let’s Encrypt bits, brings up Keycloak, and keeps an eye on each node’s CPU/memory/disk/network from one dashboard. Secrets it stores (ssh creds, onboarding secret) are encrypted at rest.

I want to be straight about where it’s at though: it’s early. It works for me, but there are parts I’m not happy with yet and I’ve written them all down in the roadmap instead of hiding them. The party/wallet-user onboarding flow needs a rethink, backups and in-place upgrades aren’t done, and the Kubernetes path hasn’t had nearly as much real-world mileage as the Compose one. Not trying to oversell it.

That’s kind of why I’m posting. I’d really like a few extra pairs of eyes. If you’ve fought with validator onboarding or party allocation before, I’d love you to poke holes in how I’m doing it. If you live in Kubernetes, the Helm path could use someone who actually knows what they’re doing. Backup and recovery design for the node identities and participant DB is another thing I keep going back and forth on. And honestly, bug reports, doc fixes, or just trying it and telling me what broke are all just as helpful as code.

For the curious: it’s Next.js and MUI on the front, NextAuth for login, Prisma for storage, ssh2 for the Compose side and the official kubernetes client for Helm. It’s meant for one operator running their own nodes, not a SaaS thing, so there’s no multi-tenant stuff to worry about.

If any of this sounds useful, come say hi on the repo. Happy to walk anyone through the code. Cheers.

Thanks for building this developer tooling,
It’s always great to see the community taking the initiative.

I will certainly test this out.
Regards,
Jatin Pandya
DevRel Manager, Canton Foundation

Thank you, this is amazing, I was going to submit it to Thank you, this is amazing, I was going to submit it to the Canton Development Fund but it seems it’s not interesting, so it’s better to just build it together

Anyone can submit a proposal for dev fund, feel free to apply, it’d be good to see more community first tools, more info here: CF: Development Fund Guide - Google Docs

So do you think this will be useful?

Hi all,

I’m Irpan from Askardex. We run a Canton validator on MainNet, and over the past months we built an internal tool to handle the parts of bringing a validator up that are tedious and easy to get wrong. We’ve cleaned it up, open-sourced it, and submitted it as a Development Fund proposal.

The tool is called NodePilot. In short, it takes a validator from a bare host or an empty Kubernetes cluster to a registered, healthy node: the participant, Postgres, Keycloak, ingress, TLS, and the one-time onboarding secret, all in one guided flow instead of a pile of shell scripts and hand-edited Helm values. It works two ways: Docker Compose over SSH for single-host operators, and Helm on Kubernetes for production. It already runs our own node, so the failure modes it guards against (the cold-start traffic deadlock, audience-mapper mismatches, onboarding-secret reuse, the topology delay on first registration) are ones we actually hit.

The PR was auto-closed because it needs a Tech & Ops Committee or Core Contributors champion, which makes sense for an external team like ours. So I’m reaching out here: if anyone working in node deployment and operations would be open to championing it, or can point me to the right person, I’d really appreciate it. We’re listed in the Node Deployment & Operations SIG, and I’m happy to give a short walkthrough or answer questions.

Proposal (PR): https://github.com/canton-foundation/canton-dev-fund/pull/402
Proposal text: https://github.com/askardex/canton-dev-fund/blob/proposal/nodepilot/proposals/nodepilot.md
Source code: https://github.com/askardex/nodepilot

Thanks for reading,
Askardex