Skip to content
View haja-k's full-sized avatar
🏠
Working from home
🏠
Working from home
  • Malaysia
  • 11:09 (UTC +08:00)

Block or report haja-k

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
haja-k/README.md

Haja β€” ☁️ Cloud AI Infrastructure Engineer (Starting Jan 2026)

I build and operate AI systems that run in production: retrieval pipelines, evaluation + quality gates, and reliable cloud infrastructure for LLM-backed products. My work is backend-first, with a focus on measurable relevance, performance, and security.

πŸ”Ž Focus: AI platform infrastructure β€’ Retrieval/RAG β€’ Performance & security automation β€’ Resilient deployments
πŸ“¬ Contact: LinkedIn β€’ Email


🧭 What I work on

  • 🧠 AI retrieval infrastructure: hybrid retrieval (graph + vector + structured filters), indexing, ranking, tuning
  • βœ… Production readiness: load testing, latency/concurrency modeling, rollout safety checks
  • πŸ” Security automation: OWASP ZAP scans wired into repeatable workflows
  • πŸ› οΈ Cloud reliability: active-active patterns, routing, TLS hardening, operational guardrails

🧰 Skills (backend-first, adaptable)

  • πŸ§‘β€πŸ’» Backend: highly adaptable across backend stacks (APIs, services, data pipelines, integrations). This is my primary strength.
  • 🎨 Frontend: not my focus (I can integrate and support, but I don’t position myself as a frontend specialist).

Languages: Python β€’ TypeScript/JavaScript β€’ Bash
AI/Retrieval: LangChain β€’ embeddings pipelines β€’ hybrid ranking β€’ evaluation workflows
Data/Stores: Neo4j β€’ PostgreSQL/pgVector β€’ TiDB β€’ MySQL
Infra/Delivery: Docker β€’ Nginx β€’ Linux β€’ active-active deployments
Testing/Quality: k6 β€’ Locust β€’ Playwright β€’ OWASP ZAP


πŸ›οΈ Production-grade deployments (publicly accessible)

Most of my production work serves state government use cases, so the codebases are confidential. Some deployed products are publicly viewable:


πŸš€ Selected public repositories (engineering examples)

These repos represent the kinds of systems I build (pipeline β†’ retrieval β†’ validation), even when production code is not public:

Area Repository What it shows
🎬 Local multi-agent AI app agentic-video-analyst offline inference + multi-agent orchestration + desktop app engineering
πŸ•ΈοΈ Graph ingestion + retrieval neo4j-document-pipeline graph modeling + retrieval API patterns for LLM workflows
πŸ“ˆ Vector + hybrid experiments tidb-vector-llm-testbed relevance/scoring experiments, indexing tradeoffs
🧬 Embedding pipeline mysql-to-pgvector-embeddings extraction β†’ embeddings β†’ pgVector semantic layer
πŸ“š Structured retrieval faq-retrieval-system structured query layer for grounded answers
πŸ§ͺ Performance testing playwright-dayang, k6-for-custom-dify UX + API load testing approaches for assistants
πŸ›‘οΈ Security automation zap-security-api ZAP baseline/quick/full scan exposed via API
🧩 Experiments playwright-study, besu-ibft2.0 targeted learning repos (testing + distributed systems)

🧠 How I approach AI systems

  • πŸ“ Prefer measured improvements (evaluation + monitoring) over demo-only features
  • ⏱️ Treat quality, latency, and security as release criteria
  • πŸ” Build systems that are operable (clear failure modes, logs/metrics, runbooks)

Pinned Loading

  1. mysql-to-pgvector-embeddings mysql-to-pgvector-embeddings Public

    vectorizing data from mysql database to vector so it can be used by LLM in Dify workflow orchestration

    Python 2

  2. tidb-vector-llm-testbed tidb-vector-llm-testbed Public

    Experimental framework for evaluating TiDB’s vector search capabilities with LangChain-based LLM retrieval workflows. Includes setup scripts, indexing pipelines, and retrieval benchmarks to test hy…

    Python

  3. neo4j-document-pipeline neo4j-document-pipeline Public

    Using Neo4j for knowledge graph. Complete with API for end-to-end ingestion, indexing and retrieval pipeline ready for workflow integration.

    Python

  4. besu-ibft2.0 besu-ibft2.0 Public

    hyperledger besu with ibft 2.0 experiment

    Shell 1

  5. img-classification-api img-classification-api Public

    image classification platform for image model self training

    JavaScript

  6. zap-security-api zap-security-api Public

    Flask + Docker service for running OWASP ZAP security scans on demand via a simple REST API. Designed for centralized, repeatable application security testing in CI/CD or ad-hoc use. Supports multi…

    Python