SSL Handshake Failure (PKIX) during Onboarding - TestNet v2)

Hi everyone,

I am attempting to onboard a validator node on TestNet v2 (version 0.5.16), but I am stuck in a loop with an SSL certificate issue.

Environment:

  • Splice/Canton Version: 0.5.16

  • Canton Motor: Running and initialized.

  • Target: auth.v2.sync.global

The Issue: The validator fails to acquire the auth token with the following error: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Technical Findings: Running openssl s_client -servername auth.v2.sync.global -connect 34.159.201.218:443 shows that the server is presenting a self-signed certificate (CN = 34.159.58.241, issued by CN = 53061de7-6ee7...).

Since this is not a trusted public CA, Java’s truststore rejects the connection.

Questions:

  1. Is there a specific Root CA certificate I should import into my validator’s truststore for v2?

  2. Is this a known misconfiguration on the Frankfurt Load Balancer for the v2 environment?

Any guidance from the engineering team would be greatly appreciated.

Best regards, Manuel

*"Update: I’ve tried several technical workarounds to bypass the PKIX error, including forcing the Java truststore path and disabling revocation checks via JAVA_TOOL_OPTIONS, but the issue persists.

Technical evidence:

  1. Canton motor is running and stable.

  2. Connectivity to Frankfurt is verified.

  3. The server at auth.v2.sync.global is still serving a self-signed certificate (CN = 34.159.58.241).

Since this is a TestNet v2 environment, is there a specific Root CA file provided by the Foundation that we must manually mount into the container’s truststore? Or is the infrastructure team planning to update the Load Balancer with a trusted certificate soon?"*

Canton motor is running and stable.

What is Canton motor? That doesn’t sound like a component of the official splice distribution.

Are you running in docker compose or k8s? There shouldn’t be any need for custom root certificates.

I’m not sure where you got the URL auth.v2.sync.global from though. The only URLs you should need to connect to are the scan, SV and sequencer URLs.

Hi cocreature, thanks for your reply.

To clarify:

  1. By “Canton motor”, I refer to the Canton Participant node (running Canton 3.x) included in the official Splice Docker distribution.

  2. I am running on Docker Compose using the official images for version 0.5.16.

  3. Regarding auth.v2.sync.global: I didn’t “get” this URL from anywhere; it is being called automatically by the validator backend during the initialization process to acquire the auth token.

Here is the relevant log snippet from the official validator container: {"@timestamp":"...","message":"Attempting to get 'user admin'","logger_name":"o.l.s.v.ValidatorApp","thread_name":"...","level":"INFO"} Followed immediately by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

When I inspect the connection manually from the host, the server at that endpoint is presenting a certificate for CN = 34.159.58.241 issued by an internal Kubernetes CA, which explains the PKIX failure.

If this URL is not supposed to be used, why is the 0.5.16 validator app attempting to connect to it for authentication? Or is the Frankfurt Load Balancer simply serving the wrong certificate for the v2 environment?


{"@timestamp":"...","message":"Attempting to get 'user admin'","logger_name":"o.l.s.v.ValidatorApp","thread_name":"...","level":"INFO"} 

That’s querying your own participant so I don’t think that’s your issue. You need to share more of the logs.

Hi cocreature,

I understand that Attempting to get 'user admin' is a local query, but the issue is that this process triggers the AuthTokenManager, which attempts to connect to the authentication server configured in the stack.

Here is the extended log showing the full sequence of the failure:

JSON

validator-1  | {"@timestamp":"...","message":"Waiting for user admin","logger_name":"o.l.s.v.ValidatorApp","thread_name":"...","level":"INFO"}
validator-1  | {"@timestamp":"...","message":"Attempting to get 'user admin'","logger_name":"o.l.s.v.ValidatorApp","thread_name":"...","level":"INFO"}
validator-1  | {"@timestamp":"...","message":"Token refresh failed","logger_name":"o.l.s.a.AuthTokenManager","thread_name":"...","level":"WARN","stack_trace":"javax.net.ssl.SSLHandshakeException: (certificate_unknown) PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target..."}
validator-1  | {"@timestamp":"...","message":"The operation 'Acquiring auth token' failed with a retryable error... javax.net.ssl.SSLHandshakeException..."}

My Configuration context: In the official v2 deployment files, the participant-client.ledger-api.auth-config points to: well-known-config-url: "https://auth.v2.sync.global/.well-known/openid-configuration"

When the validator container tries to reach that URL to refresh the token for the local user, it fails because the server at auth.v2.sync.global presents a self-signed certificate (CN = 34.159.58.241).

If you are sure that we shouldn’t need custom root certificates, then auth.v2.sync.global should be serving a certificate signed by a public CA. Could you please verify if that endpoint is correctly configured on your side for the v2 TestNet?

Can you point out where you saw this in the official v2 deployment files? I’ve never seen that URL before and can’t find it anywhere. This should point at your own IAM’s well known config.

Hi cocreature,

I just received feedback from the Sync Foundation team (Pedro Neves) regarding this. They claim that all endpoints use standard publicly trusted certificates.

However, there is a clear technical contradiction:

  1. The official v2 setup points to https://auth.v2.sync.global.

  2. When my validator (0.5.16) tries to reach that URL, it hits the IP 34.159.58.241.

  3. That IP is serving a self-signed certificate issued by an internal Kubernetes CA (CN = 34.159.58.241).

This is why Java’s truststore is failing with the PKIX error. It’s not a local Java issue; it’s that the Auth endpoint is not presenting a publicly trusted certificate like the Scan or SV endpoints do.

If auth.v2.sync.global is indeed the correct IAM for TestNet v2, could you please confirm with the infrastructure team if it’s expected to have a self-signed certificate? If so, we definitely need the Root CA to be mounted in the validator container.

If auth.v2.sync.global is indeed the correct IAM for TestNet v2,

It is not, this needs to be your own IAM. Can you please point out where you got that URL from? I still have not been able to find that anywhere in the deployment files.

Hi cocreature,

I’ve followed your advice and removed all legacy URLs. I am now attempting to run the validator in standalone/unauthenticated mode for TestNet v2.

I want to highlight that the underlying infrastructure is working perfectly:

  1. Canton Engine is Ready: Logs confirm: “Success: Canton Participant Admin API is initialized” and “Participant tecnologia-solutions-val-1 is initialized”.

  2. Connectivity is Perfect: The node has no issues reaching the Scan or Sequencer URLs.

However, I’ve hit a wall with the Validator App. Even after setting auth.algorithm = "none" and auth-config.type = "none" (both in app.conf and via environment variables), the container stubbornly continues to attempt a connection to localhost:80 for a token provider, resulting in a persistent Connection refused loop.

It seems version 0.5.16 is ignoring these overrides and forcing an OIDC flow.

  1. Is there a specific hidden flag required to truly disable IAM in this version?

  2. Or is it now mandatory for every validator to provide a local OIDC/IAM (like Keycloak) even for initial testing?

I’d appreciate the specific configuration needed to bypass this “Acquiring auth token” loop, as the core node is clearly ready to go.

It is definitely possible to run without an OIDC flow as described in the docs. I’m again not sure where you set “localhost:80”, afaik that’s not one of the standard URLs configured.

I recommend to start over and follow the instructions here Docker Compose-Based Deployment of a Validator Node — Splice documentation . It seems like you may have some old state or URLs configured that do not match the instructions.

Hi,

I am currently setting up my validator node (version 0.5.18) and I have successfully resolved all local infrastructure issues. The database migrations are complete, the Participant is fully initialized, and the node identity has been established.

However, the node is unable to complete the onboarding process. It is stuck with an UNAVAILABLE error when attempting to connect to the Seed URL: https://34.159.201.218.

I have performed several connectivity tests from within the container and found the following:

  • Direct calls to the IP return a 403 Forbidden error from the Kubernetes API Server (User "system:anonymous" cannot get resource...).

  • Calls using the hostname scan.v2.sync.global via --resolve return the same 403 error.

This suggests that either the IP address 34.159.201.218 is no longer the correct entry point for the Scan Service in this cluster version, or the hostname has changed.

Could you please provide the following updated information for version 0.5.18?

  1. The updated Scan Service URL (Seed URL).

  2. The current IP address for the SV/Scan endpoint.

  3. A new Onboarding Secret, as the previous one has likely expired.

Thank you for your help.

Best regards, Manu