AI Engineer (Agentic Sidecar & RAG)

Agentic AI

RAG

Evaluation-first

About Intelletto.ai

Intelletto.ai builds Agentic AI sidecars—explainable companions that plug into existing enterprise systems (ATS/HCM/CRM/ERP) to transform operational data into decisions, not dashboards. Sidecars honor least-privilege access, write back outcomes for closed-loop learning, and are governed by design (RBAC/ABAC, audit trails, evidence packs). We ship practical AI that moves KPIs in recruiting, workforce intelligence, and customer/revenue domains—without rip-and-replace.

Role overview

Design and ship production-grade AI services that combine retrieval‑augmented generation (RAG), agentic workflows, and evaluation-first MLOps. You’ll work across embeddings, vector search, event streams, and LLM orchestration—optimizing for latency, accuracy, cost, and governance. Expect to own features end-to-end: prototype → evaluate → harden → ship → monitor.

Core technologies (you’ll touch)

AWS Bedrock

OpenSearch Serverless (k‑NN + BM25)

AWS S3

AWS RDS

AWS Textract

AWS MSK / Kafka

AWS API Gateway

LangChain

Python

Node.js

LLaMA 3

Claude Sonnet 3.5

AWS Titan Text Embeddings

Keycloak (IAM)

Flutter (client surfaces)

These reflect the reference stack detailed in our internal “AI Technology Used” brief.

What you’ll do

Build sidecar intelligence: RAG pipelines over OpenSearch (hybrid k‑NN/BM25), chunking, Data Fusion enrichment, citations, and prompt templates via LangChain; agentic flows with tool calls, retries/fallbacks, and policy‑as‑code.
Ship production services: Expose capabilities behind API Gateway; author Python/Node microservices with structured logging, tracing, circuit breakers; stream deltas via Kafka/MSK to keep indexes and features fresh.
Work the data edge: Ingest from S3; extract structure with Textract; persist metadata & features in RDS; build embedding jobs with Titan Text Embeddings; parsers/normalizers for CVs/JDs, interaction logs, and outcome signals (30/90/180‑day).
Model & prompt engineering: Compose LLaMA 3 and Sonnet 3.5 for reasoning/structuring; maintain prompt/model registries; create offline/online evals (P@k, groundedness, latency, cost); tune chunking, rerankers, few‑shot seeds.
Security & governance: Enforce Keycloak roles/claims; least‑privilege access, redaction, PII handling, audit events, and evidence exports (e.g., hiring decisions); privacy‑by‑design (GDPR/PDPA).
Reliability & cost: Hit SLOs; build autoscaling & caches; instrument cost observability (token/GB/index ops) with budgets and alerts.

Minimum qualifications

5+ years in applied ML/AI or backend platforms with production deliveries of RAG/LLM features or search‑heavy APIs.
Proficiency in Python (FastAPI/Flask) and practical LangChain patterns (tools, agents, retrievers, callbacks).
Hands‑on with vector search (OpenSearch/Elastic/FAISS/etc.), embeddings, and hybrid retrieval.
Solid AWS across S3, RDS, API Gateway, MSK/Kafka, containerized deployment, and IaC concepts.
Secure services (OAuth2/OIDC, RBAC/ABAC), data minimization, and auditability.
Evaluation mindset: harnesses, A/Bs, and post‑deploy telemetry.

Preferred qualifications

Experience with Bedrock model access; comfort with LLaMA 3 and Claude Sonnet 3.5.
Document AI (Textract), resume/JD parsing, or HRTech/FinTech/eCommerce data patterns.
Retrieval/ranking depth (BM25 tuning, rerankers, semantic caching).
Event‑driven systems at scale (Kafka topologies, exactly‑once semantics).
Familiarity with Keycloak; exposure to Flutter clients or GraphQL backends a plus.
Compliance mindset: GDPR/PDPA, DPIAs, and data‑retention strategies.

Success looks like (first 90 days)

30 days: Dev env ready; seed corpus indexed; baseline RAG endpoint (Titan embeddings + OpenSearch) with eval harness and dashboards.
60 days: Sidecar API in staging behind API Gateway; agentic pipeline w/ at least one tool (Textract/metadata lookup); SLOs defined; security review passed.
90 days: Productionized service with autoscaling, caching, alerts; correctness uplift validated vs. baseline; cost/request within target; evidence pack export enabled.

ImpactTime‑to‑decision, recruiter productivity, revenue lift where applicable.

ReliabilitySLO attainment, incident rate, MTTR.

EfficiencyLatency p95, cache hit‑rate, $/query & token budgets.

How we work

Evaluation‑first: Every feature ships with tests, metrics, and dashboards.
Security‑by‑design: Least privilege, audit, and privacy in the requirements.
Fast feedback: Trunk‑based dev, small PRs, weekly “show the thing,” ruthless test deflake.

Compensation & location

Competitive salary + early‑stage upside. Remote‑first with APAC‑friendly collaboration (GMT+8). We hire for craft and outcomes.

Equal opportunity

We celebrate difference and hire for capability, integrity, and impact. If you’re excited by the problem space but don’t tick every box, we’d still like to hear from you.

Apply with: a link to something you’ve shipped (repo, paper, demo), a short note on a retrieval or agentic optimization you’re proud of (what changed and why), and your preferred tool stack.

Apply now