Deep Dive Post-Mortem

Technical Issues Resolved

An in-depth look at the engineering challenges faced while deploying the microservices architecture on Oracle Cloud (ARM64). Discover the root causes, the exact error logs, and the terminal commands used to fix them.

πŸ•ΈοΈ

CI/CD Runner Network Isolation (Gitea Actions)

β–Ό

Issue Connection Refused / Network Unreachable

The CI/CD pipeline steps (using Docker containers via act-runner) were failing to execute git clone or push Docker images. The standard Docker bridge network created by act-runner suffered from MTU fragmentation and NAT translation issues when communicating with K3s Pod IPs and Services on this specific Oracle Cloud virtualized network.

remote: Invalid username or password.
fatal: Authentication failed for 'http://git.khalilaliouich.com/...'
curl: (7) Failed to connect to gitea-http.gitea.svc.cluster.local port 3000

Resolution Host Network Namespace

We configured the Gitea act-runner to force all CI job containers to run on the host's network namespace, allowing ephemeral CI containers to seamlessly resolve .svc.cluster.local domains without NAT overhead.

# /data/gitea-runner/config.yaml
container:
  network: "host"
  options: "--add-host=gitea-http.gitea.svc.cluster.local:10.43.0.10"
⚑

ArgoCD gRPC Interference with Linkerd

β–Ό

Issue 502 Bad Gateway / gRPC Connection Error

ArgoCD became completely inaccessible. The argocd-server logs were filled with TLS handshake failures. The installation of the Linkerd Service Mesh globally injected sidecars into the ArgoCD namespace, which aggressively intercepts gRPC traffic. ArgoCD heavily relies on internal gRPC between its server, repo-server, and application-controller.

rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed"

Resolution Disable Sidecar Injection

We disabled Linkerd proxy injection specifically for the ArgoCD namespace and restarted the controllers to restore internal communication.

kubectl annotate namespace argocd linkerd.io/inject=disabled --overwrite
kubectl delete pods --all -n argocd
πŸ”

GitOps Manifest Push Authentication

β–Ό

Issue Context Limitations

After successfully building the Docker image, the pipeline failed during the deployment manifest update phase. The default GITHUB_TOKEN injected by Gitea Actions was insufficient for pushing back to the repository within the specific job context over HTTPS.

Resolution OAuth Token Injection

We modified the .gitea/workflows/deploy.yaml to inject a dedicated access token directly into the remote URL before executing the push.

# Inside the CI/CD Pipeline step:
git remote set-url origin "http://oauth2:${{ secrets.GITEA_TOKEN }}@git.khalilaliouich.com/khalil/tamagotchi-service.git"
git push origin HEAD:main
🐳

ImagePullPolicy Stale Caching

β–Ό

Issue ErrImagePull / Stale Deployments

ArgoCD successfully synced the new k8s.yaml manifest, but the K3s worker nodes refused to pull it. Initially, the registry URL was configured as the internal service gitea-http.gitea.svc.cluster.local:3000. K3s containerd daemon resolves DNS differently than pods and couldn\'t authenticate properly without specific registry mirrors.

Resolution External Domain & Always Pull

We switched the registry target to the external proxy domain and forced strict layer hash validation.

# k8s.yaml
spec:
  containers:
    - name: api
      image: git.khalilaliouich.com/khalil/tamagotchi-api:v2
      imagePullPolicy: Always
πŸ”‘

ArgoCD RBAC "Guest" Credentials

β–Ό

Issue Rejected Login

The showcase website advertised guest / visitor2026 as the credentials for ArgoCD, but ArgoCD rejected the login despite proper RBAC mapping in the ConfigMap.

Resolution Bcrypt Secret Patching

We generated a raw bcrypt hash manually via Python and directly patched the argocd-secret to inject the guest password securely.

BCRYPT_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'visitor2026', bcrypt.gensalt()).decode())")
BASE64_HASH=$(echo -n "$BCRYPT_HASH" | base64 -w 0)

kubectl patch secret argocd-secret -n argocd -p '{"data": {"accounts.guest.password": "'$BASE64_HASH'"}}'
πŸš€

Node.js vs Nginx Port Bindings

β–Ό

Issue 502 Bad Gateway

A frontend CSS update accidentally reverted the container build to an old Nginx Dockerfile, causing a port mismatch (80 vs 3000) for the live metrics backend since the Kubernetes Service was still routing to 3000.

Resolution Architecture Restoration

Restored the server.js Node.js proxy architecture, rebuilt the image via nerdctl, and executed a rolling K8s deployment update with proper ServiceAccount bindings.

sudo nerdctl build -t showcase-website:latest .
kubectl set image deployment/showcase-website website=showcase-website:latest -n showcase
kubectl rollout restart deployment/showcase-website -n showcase