Senior Site Reliability Engineer / Infrastructure Architect (Principal DevOps Engineer)

97EX

$2.6-3K[Monthly]
Remote3-5 Yrs ExpBachelorFull-time
Share

Remote Details

Open CountryWorldwide

Language RequirementsChinese

Job Description

Show original text

Benefits

  • Employee Recognition and Rewards

    Distributed team, No Monitoring System, No Politics at Work

  • Time Off & Leave

    Paid Time Off, Unlimited or Flexible PTO, Government Mandated Leave

Job Responsibilities 1. Cloud Native Architecture Design and Governance: - Design highly available architectures on AWS and Cloudflare, extending beyond CDN configuration to implement edge logic with Cloudflare Workers and secure access layers using Argo Tunnel/Zero Trust. - Manage AWS multi-account structures via Organizations, architect cross-Region networking (Transit Gateway, VPC Peering, VPN) to resolve complex connectivity and latency challenges. - Enforce Infrastructure as Code (Terraform/Pulumi) across edge rules and underlying resources to minimize manual console operations. 2. Deep Kubernetes Engineering: - Maintain large-scale EKS or self-managed clusters, performing performance tuning and troubleshooting of core components such as etcd, CNI plugins (Cilium/Calico), and CoreDNS. - Develop Kubernetes Operators/Controllers or kubectl plugins to enhance platform automation based on business requirements. - Bridge local development and production environments (Docker Compose to Helm/Kustomize) to ensure consistency. 3. Engineering Productivity and Observability: - Design and maintain complex CI/CD pipelines, integrating code quality analysis (SonarQube), container image security scanning, and automated testing. - Implement GitOps workflows using ArgoCD or Flux. - Build a Prometheus-based monitoring system with in-depth runtime (Go/Java) and system-level (eBPF) performance analysis. 4. System-Level Support and Reliability: - Maintain middleware such as Nginx, Redis, and Kafka with capabilities for source-level debugging and parameter tuning. - Address system bottlenecks under high concurrency (TCP queues, file handles, memory management). - Linux Systems Expert: Deep understanding of Linux kernel internals and proficient use of perf, strace, tcpdump, eBPF, and other tools to diagnose CPU, I/O, and network issues in production. - Cloud and Networking Proficiency: Familiarity with AWS infrastructure limits (API rate limits, EBS IOPS) and Cloudflare fundamentals (Anycast, SSL handshake), with a deep understanding of the TCP/IP stack and HTTP/2/3 protocols. - Kubernetes Hands-On Experience: In-depth knowledge of cgroups and namespaces, service meshes (Istio/Linkerd), and rapid diagnosis of pod scheduling failures or crashes. - Development Skills: Proficient in Go or Python, capable of reading open-source code, fixing bugs, and developing backend tools. Preferred Qualifications - Contributor to CNCF open source projects. - Experience maintaining systems handling hundreds of millions of daily requests. - Hands-on experience implementing chaos engineering in production environments.
Preview

Dora lee

人力資源經理97EX

Reply 0 Times Today

Posted on 27 December 2025

Report this job

Bossjob Safety Reminder

If the position requires you to work overseas, please be vigilant and beware of fraud.

If you encounter an employer who has the following actions during your job search, please report it immediately

  • withholds your ID,
  • requires you to provide a guarantee or collects property,
  • forces you to invest or raise funds,
  • collects illicit benefits,
  • or other illegal situations.
Tips
×

Some of our features may not work properly on your device.

If you are using a mobile device, please use a desktop browser to access our website.

Or use our app: Download App