UPTIME: 99.9% // SLO: COMPLIANT

Engineering
systems that
never sleep.

I'm Nishant Kamal. I build resilient, automated cloud platforms that scale without breaking.

View Projects →

Live Metrics

Reliability by the Numbers

Years of hands-on experience across cloud infrastructure, platform engineering, and site reliability.

About Me.

Building Robust Systems & Scalable Infrastructure

System Operational

FarEye Tenure0d 00h 00m 00s

DevOps Engineer II

Nishant Kamal

DevOps Engineer II working extensively with cloud-native and Kubernetes-based distributed systems at scale — driving platform engineering, CI/CD automation, and resilient multi-cloud infrastructure across production environments.

LocationDelhi, India

FocusPlatform Engineering

EducationM.Tech · BITS Pilani · 2026

KubernetesAWSOCICrossplaneFluxCDHelmGitOpsPrometheusGrafanaIstioTerraformObservability

Highlights

99.9%

Platform Uptime

SLO-compliant delivery

35%

Cloud Cost Reduction

via Karpenter + Spot

50+

Services Monitored

Prometheus + AlertManager

-40%

MTTR Improvement

standardised runbooks

Awards

over 6+ years

What I Build

GitOps Workflows

Automated, auditable Kubernetes deployments with FluxCD and Helm across multi-environment CI/CD pipelines.

Cloud Infrastructure

Declarative cloud provisioning on AWS and OCI using Crossplane and Kubernetes CRDs for developer self-service.

Observability Stacks

Unified monitoring across 50+ services with Prometheus, Grafana, and Istio service mesh for end-to-end visibility.

Reliability Engineering

99.9% uptime delivery, 35% cost reduction via Karpenter, and MTTR reduction through automated runbooks.

I was never the smartest in the room — I just never left until I was.
— Nishant Kamal

Always eager to collaborate on DevOps, SRE, and platform engineering initiatives — whether it's infrastructure design, automation strategies, or building resilient, scalable systems.

Open to Collaborate

Technical Stack
& Ecosystem.

Kubernetes

95%

Docker

92%

Istio

80%

FluxCD

85%

Roles That Built Me.

DevOps Engineer IICurrent

FarEye · Noida, India

Apr 2026 – Present

Leading platform engineering initiatives across multi-cloud environments (AWS, Azure, OCI), driving reliability, automation, and developer self-service capabilities.
Architecting and provisioning cloud infrastructure using Crossplane and GitOps workflows with FluxCD, enabling on-demand environment reproducibility.
Driving CI/CD pipeline improvements across GitHub Actions and Helm-based monorepos, reducing deployment friction and cycle times.
Conducting root-cause analyses after major incidents to identify process improvement and technical enhancement opportunities.

CrossplaneFluxCDMulti-CloudGitHub Actions

Site Reliability Engineer

FarEye · Noida, India

May 2023 – Mar 2026

Architected and created cloud infrastructure on AWS EKS, ensuring high availability and scalability, with integrated observability using Prometheus, Grafana, and New Relic.
Implemented Karpenter for dynamic scaling, replacing ASG and improving cost efficiency in the Kubernetes environment.
Deployed Istio, Virtual Services, and Gateways to manage traffic routing and enhance microservices communication within clusters.
Leveraged Helm Charts extensively for automating application deployments and simplifying IaC processes.
Streamlined CI/CD pipelines by integrating FluxCD and GitHub Actions, improving deployment cycles across teams.
Implemented cost-saving measures by optimising resource utilisation across cloud-based infrastructure environments.

AWS EKSIstioKarpenterPrometheus

Senior Tech Engineer

FarEye · Noida, India

Jun 2022 – Apr 2023

Visited Landmark Group in Dubai to collaborate with cross-functional teams, addressing infrastructure challenges and optimising system performance.
Proactively identified performance bottlenecks; implemented monitoring and automation techniques to enhance reliability.
Troubleshot production issues, working closely with dev teams to improve system uptime and minimise downtime.
Collaborated on migration initiatives, improving deployment consistency and platform stability across environments.

Incident ResponseMonitoringMigrations

Tech Engineer

FarEye · Remote

Jun 2020 – May 2022

Enhanced infrastructure performance through proactive system monitoring and troubleshooting, ensuring minimal downtime.
Provided infrastructure support for production environments, ensuring high availability and rapid incident response.
Collaborated with SRE and development teams to resolve operational issues and maintain SLA compliance.

Infra SupportSLAProduction Ops

Work that
moves the needle.

Kubernetes GitOps Deployment

EKS · FluxCD · GitHub Actions · Helm · Prometheus · New Relic · Grafana

Deployed scalable, highly available infrastructure on AWS EKS with integrated monitoring via Prometheus, New Relic, and Grafana for proactive issue resolution. Streamlined CI/CD pipelines by integrating FluxCD and GitHub Actions, enhancing automation and improving deployment cycles across teams. Leveraged Helm Charts to automate application deployments, manage Kubernetes resources, and simplify infrastructure-as-code processes.

Key Highlights

Integrated Prometheus, New Relic & Grafana — proactive alerting and observability across all services

FluxCD + GitHub Actions CI/CD pipeline — eliminated manual deploy steps across all teams

Helm Charts as IaC — versioned, repeatable deployments with zero configuration drift

-70%

Deployment Time

Drift Incidents

KubernetesAWS EKSFluxCDGitHub ActionsHelmPrometheusNew RelicGrafanaGitOps

AWS Cost Optimization & Karpenter Migration

Karpenter · EC2 · ASG · Spot Instances · Resource Right-Sizing

Implemented Karpenter for dynamic scaling, effectively replacing the Auto Scaling Group (ASG), optimizing resource utilization, and improving cost efficiency across the Kubernetes environment. Deployed multiple microservices using Karpenter for autoscaling while ensuring seamless traffic management. Implemented cost-saving measures by optimizing resource utilization across all cloud-based infrastructure environments.

Key Highlights

Replaced ASG with Karpenter — intelligent bin-packing & Spot Instance utilization at scale

Right-sized all node groups — eliminated chronic over-provisioning across clusters

Cost-saving framework applied org-wide across all cloud workloads

35%

Cost Reduction

3×

Provisioning Speed

KarpenterAWSEC2ASGCost OptimizationSpot InstancesAutoscaling

Oracle Cloud Infrastructure POC

OCI · Karpenter · Istio · Virtual Services · Gateway · Service Mesh · QA

Designed and executed a proof-of-concept for Oracle Cloud Infrastructure (OCI) and successfully deployed the first production environment. Deployed Karpenter on OCI to automate node provisioning and manage compute costs efficiently through intelligent bin-packing. Deployed Istio, Virtual Services, and Gateways to manage traffic routing and enhance microservices communication within Kubernetes clusters, streamlining service mesh configurations. Resolved all QA-reported issues to achieve a stable, live deployment.

Key Highlights

Istio + Virtual Services + Gateways — full service mesh traffic control on OCI

Karpenter on OCI — automated node provisioning with intelligent cost management

End-to-end QA ownership: resolved every issue from staging to stable live production

1st

Environments Deployed

100%

QA Issues Resolved

OCIOracle CloudKarpenterIstioVirtual ServicesGatewayService MeshPOCProduction DeploymentQA

ACR Migration & Decommission

Azure Container Registry · Image Cleanup · CI/CD Pipeline Cutover

Led decommissioning of the old Azure Container Registry and migrated all assets to a new ACR instance. Executed a full audit and cleanup of stale and unused images, reducing storage bloat and eliminating legacy registry dependencies across all pipelines. Troubleshot production issues effectively, working closely with development teams to implement solutions that improved system uptime and minimized downtime throughout the migration.

Key Highlights

Full image audit — all stale & orphaned layers identified and purged from registry

Zero-downtime pipeline cutover coordinated across all development teams

Cross-functional collaboration with dev teams ensured no regressions post-migration

100%

Registry Migrated

↓ GB

Stale Images Removed

AzureACRContainer RegistryMigrationImage CleanupCI/CDDevOps

Alerting Standardization & Observability

Prometheus · AlertManager · Helm · Grafana · RCA · Automation

Built a Helm-based centralized alerting system improving monitoring consistency across 50+ services. Reduced MTTR by establishing unified runbooks, alert routing rules, and escalation policies. Conducted root-cause analyses after major incidents to identify areas for process improvement or technical enhancement. Enhanced system reliability by implementing monitoring tools and automation techniques. Proactively identified performance bottlenecks, working on continuous improvements to system resilience and reliability.

Key Highlights

Post-incident RCA framework standardized — consistent learnings captured across all teams

Unified runbooks + routing rules + escalation policies covering 50+ services

Proactive bottleneck detection via automation reduced reactive firefighting significantly

50+

Alerts Standardized

-40%

MTTR

PrometheusAlertManagerHelmGrafanaMTTRRCAObservabilityAutomationSRE

Crossplane Infrastructure Automation

Crossplane · Compositions · XRDs · Azure · OCI · Developer Self-Service

Implemented Crossplane across Oracle Cloud (OCI) and Azure to standardise infrastructure definitions and eliminate manual provisioning. Created reusable Compositions and XRDs enabling developer self-service for on-demand cloud environments. Reduced provisioning time by 40–60% and improved reproducibility across both cloud providers.

Key Highlights

Crossplane on OCI & Azure — unified control-plane IaC across two cloud providers

Reusable Compositions & XRDs — developers provision environments without ops involvement

40–60% cut in manual provisioning time with full environment reproducibility

-50%

Provisioning Time

Cloud Providers

CrossplaneOCIAzureCompositionsXRDsIaCGitOpsPlatform EngineeringSelf-Service

Metrics-Driven Self-Healing Canary Framework

Istio · Flux CD GitOps · Prometheus · Grafana · Python Controller · M.Tech Dissertation

Designed and built a three-layer self-healing architecture for canary deployments as part of my BITS Pilani M.Tech dissertation. A lightweight Python controller observes P99 latency and 5xx error rate from Istio telemetry every 30 seconds, evaluates canary health against threshold logic, and autonomously patches Istio VirtualServices to step traffic 10→30→50→80→100% or trigger instant rollback — with every promotion and rollback committed through Flux CD GitOps for full auditability. Validated across multiple controlled fault-injection scenarios on a live Minikube + Istio + Prometheus cluster.

Key Highlights

Three-layer architecture (Observability → Decision & Control → Execution) — fully autonomous canary promotion and rollback

Python controller evaluates canary health every 30s via PromQL and patches VirtualService weights with zero human intervention

>99% MTTR reduction validated across fault-injection scenarios — rollback in under 5 seconds vs. 20–30 min manual detection

GitOps-first design — every canary promotion and rollback is a Git commit via Flux CD, with full audit trail and drift prevention

>99%

MTTR Reduction

<5s

Rollback Time

IstioFlux CDGitOpsPrometheusGrafanaPythonCanary DeploymentSelf-HealingSREM.Tech Dissertation

Building on Engineering Excellence.

M.Tech in Cloud Computing

BITS Pilani // Postgraduate · Completed

2024 – 2026

Specialized in Cloud-Native architectures and Distributed Systems. Dissertation on a metrics-driven traffic management framework for self-healing microservices, covering high-availability patterns and automated control-plane scaling for modern enterprise clusters.

Dist. SystemsK8s LogicPlatform Eng

B.Tech in Electrical & Electronics Engineering

VIT University, Vellore // Undergraduate

2015 – 2019

Core foundation in engineering logic and systematic problem-solving. Specialized in low-latency communication and hardware-level performance optimization.

Systems LogicEngineering DesignCircuitry

Certifications in Production Readiness

AWS · CNCF · HashiCorp // Active Credentials

Ongoing

Deeply certified across the cloud ecosystem. Maintaining active status in Kubernetes administration, AWS architecture, and Terraform-based automation.

CKAAWS SAPTerraform Associate

Recognition &
Awards.

FarEyeLogistics Intelligence

7 Awards4 yrs tenure

7 awards across 4 years

Customer Happiness Champion

FarEye

Jun 2025

QuarterOND 2024

For being the Customer Happiness Champion for OND 2024 Quarter, reflecting outstanding dedication to client success and service quality.

Customer Excellence

FarEye Acers : Rising Star

FarEye

May 2024

QuarterJFM 2024

Recognised for emerging leadership and innovation in SRE practices, demonstrating exceptional growth and technical initiative at FarEye.

Leadership & Innovation

Spotted Award

FarEye

Apr 2023

QuarterJFM 2023

Recognised for consistently going above and beyond designated responsibilities to support customers, often extending beyond regular shift hours, reflecting strong ownership, team-first mindset, and a customer-centric approach in high-pressure situations.

Above & Beyond

Captain Marvel

FarEye

Mar 2023

QuarterOND 2022

For demonstrating the superpower of passion for customers for OND 2022.

Customer Passion

Dark Knight

FarEye

Oct 2022

QuarterJAS 2022

For demonstrating the superpower of complex problem solving for JAS 2022.

Problem Solving

Dark Knight

FarEye

Jun 2022

QuarterJFM 2022

For demonstrating the superpower of complex problem-solving for JFM 2022.

Technical Excellence

Captain Marvel

FarEye

Oct 2020

QuarterJAS 2020

For demonstrating the superpower of passion for customers for JAS 2020.

Customer-Centric

Let's Connect.

Interested in building reliable, scalable infrastructure or discussing cloud-native architectures? Drop a message.

Engineeringsystems thatnever sleep.

Reliability by the Numbers

About Me.

Nishant Kamal

Technical Stack& Ecosystem.

Roles That Built Me.

Work thatmoves the needle.

Building on Engineering Excellence.

Recognition &Awards.

Let's Connect.

Engineering
systems that
never sleep.

Technical Stack
& Ecosystem.

Work that
moves the needle.

Recognition &
Awards.