I am a highly motivated Senior Data Engineer specializing in MLOps and Generative AI (GenAI) application development, with 5-6 years of professional experience. I focus on designing, building, and deploying highly reliable and scalable data and machine learning platforms.
My expertise is anchored by published, cutting-edge research, including a First-Authored Publication in the Top-Tier Computer Science Journal, Information Fusion. My recent professional work centers on delivering advanced RAG systems and automated AI-labeling solutions in production environments. I champion robust MLOps practices to ensure efficient AI application lifecycle management and drive measurable business value.
- Groundbreaking Academic Achievement: First Author of an article on an Attentive Gated Graph Sequence Neural Network for financial trading, published in the Top-Tier Computer Science Journal, Information Fusion.
- GenAI/LLM System Development: Successfully designed and deployed multi-modal RAG and structured data RAG solutions. Developed a cost-effective structured data RAG using DuckDB and Gemini-2.5-flash-lite.
- Production MLOps: Built an end-to-end, automated AI-labeling system by integrating LLM (AWS Bedrock, PydanticAI) with Flask, Apache Kafka, and Dagster. The system's output is actively used in production recommendation and search engines.
- Data Platform Leadership: Led the architecture optimization for Google BigQuery, resulting in a 90% reduction in query costs. Served as Code Owner for the BI codebase and mentored Junior Engineers.
- Deep Learning Systems: Developed a deep learning-based Personalized Recommendation System (achieving 11.2%+ lift over baseline) and a GNN-based user preference system.
| Category | Core Expertise & Tools |
|---|---|
| GenAI / RAG / LLM | AWS Bedrock (Knowledge Bases), LlamaIndex, PydanticAI, Huggingface, MLflow (GenAI Tracking) |
| MLOps & Lifecycle | MLflow, Databricks, SageMaker, Docker, CI/CD, uv / poetry |
| Data Orchestration | Dagster, Airflow, Airbyte, dbt (SQL) |
| Data Storage & DB | BigQuery (Optimization), DuckDB, MongoDB (Document DB), PostgreSQL, AWS S3, AWS Neptune (Graph DB) |
| Cloud & IaC | AWS (S3, EC2, Kinesis, Personalize, Rekognition), Terraform, Git |
| Programming | Python (Core), SQL, PySpark, Flask / FastAPI (API), Gremlin |
| ML Models | GNN, Deep Learning-based Recommendation Systems, Financial Time-Series, PyTorch, Keras |
