Hufflepuff · Data Wizard

Manas Dani ✦ AI & Data Science

Indiana University Bloomington · Master of Data Science

Enter the Castle → GitHub

Scroll to explore

ᚺ About the Wizard ᚺ

"Loyalty · Hard Work · Dedication"

4+ Years XP

MS Data Science

AI Engineer

Greetings, Fellow Wizard!

I'm Manas Dani — a Data Scientist and AI Engineer currently pursuing my Master's in Data Science at Indiana University Bloomington (Class of 2026). Like a true Hufflepuff, I believe in the magic of perseverance, loyalty, and putting in the real work.

My journey spans from enterprise-scale SAP automation at Cognizant (British Airways) to cutting-edge AI research building LLM pipelines, RAG systems, and multi-modal inference engines. I specialize in turning raw data into intelligent systems that actually ship.

🎓
                    Indiana University Bloomington
                    MS Data Science · 2024–2026
                  

🔬
                    Research Assistant
                    AI Visibility & AEO · Kelley School of Business
                  

🏛️
                    Vishwakarma Institute of Technology
                    B.Tech Instrumentation & Control · Pune, India
                  

"It matters not what someone is born, but what they grow to be."
— Albus Dumbledore

⚗ Magical Experience ⚗

🔬

Research Assistant — AI Visibility & AEO

Indiana University Kelley School of Business

Jan 2026 – Present

Built AEO tool for theLLMStore processing 100+ websites weekly — modular extraction pipeline transforming content into LLM-ready JSON-LD schemas with 95%+ accuracy
Developed AI visibility scoring system evaluating 15+ content quality metrics, tracking brand visibility across ChatGPT, Perplexity, Gemini & Claude with 50+ brand mentions daily

PythonRAGLangChainJSON-LDLLM APIs

🧠

Research Assistant — Social Media User Learning

Indiana University Bloomington

Aug – Dec 2025

Built scalable pipeline processing 26M+ users and billions of Bluesky social media posts for similarity, clustering & behaviour analysis
Fine-tuned Sentence-BERT with contrastive learning; engineered distributed data pipeline converting compressed JSON → Parquet with FAISS similarity search and UMAP on JetStream

PyTorchSBERTFAISSPySparkUMAP

🤖

AI Developer Intern

Ohacks / Heritage Square Foundation · Remote

Jun – Aug 2025

Implemented Archyx AI — production-grade LLM system automating search, tagging & reorganisation for 100+ archival collections, cutting manual curation by 40%
Designed RAG pipeline using ChromaDB for 10K+ document chunks with API-level cost tracking across 3 LLM providers (OpenAI, Groq, OpenRouter)

ChromaDBOpenAISupabaseFastAPIJWT

💼

Programmer Analyst — SAP VIM (British Airways)

Cognizant · Pune, India

Feb 2022 – Jul 2024

Led SAP VIM invoice automation with OCR-based capture, cutting manual processing by 35% and accelerating turnaround by 2 days
Architected REST API data pipelines using Java & PySpark processing 10,000+ invoices/month with SAP FI/MM integration and HDFS/Parquet storage

SAP VIMJavaPySparkOCRREST APIs

📖 Spellbook of Projects 📖

Accio Insights

Multi-Model Inference Pipeline

ML-Based Model Router

Production-grade inference API routing requests across 3 model backends using an ML-trained router (Sentence-BERT + lightweight classifier). Reduced p95 latency from 3200ms → 1200ms and cost per request by 55%.

82%Routing Accuracy

55%Cost Reduction

800+Queries Eval'd

PythonFastAPIPyTorchMLflowDocker

Revelio Verbum

Archyx AI — Archival Intelligence

Production LLM System

LLM-powered system automating search, tagging and reorganisation for 100+ archival collections. RAG pipeline with ChromaDB for 10K+ document chunks, API cost tracking across 3 LLM providers.

40%Less Curation Time

10K+Doc Chunks

3LLM Providers

ChromaDBOpenAIGroqSupabaseFastAPI

Marauder's Map

Social Media User Representation

Large-Scale ML Research

Scalable user representation learning pipeline processing 26M+ users and billions of Bluesky posts. Fine-tuned Sentence-BERT with contrastive learning + FAISS similarity search on JetStream.

26M+Users Processed

3Benchmarks

TB+Data Scale

SBERTFAISSPySparkParquetUMAP

🪄 Magical Abilities 🪄

🧠

AI & Machine Learning

PyTorch

TensorFlow

Hugging Face

LangChain

OpenAI APIs

LangGraph

LangSmith

Fine-tuning

Prompt Engineering

Scikit-learn

RAG Systems

Model Evaluation

⚗️

Data & Analytics

Python

Pandas

NumPy

PySpark

XGBoost

NLTK

SpaCy

Tableau

Power BI

Dash

☁️

Cloud & Infrastructure

AWS

Docker

FastAPI

Git / CI/CD

GCP

Azure

Kubernetes

MLflow

Snowflake

BigQuery

🗄️

Databases & Vector Stores

PostgreSQL

FAISS

ChromaDB

MongoDB

MySQL

Supabase

Pinecone

Neo4j

REST APIs

Programming Languages

Python

Archmagus

TypeScript / JS

Adept

Java

Adept

SQL

Expert

✍️ Writings from the Restricted Section ✍️

Thoughts on AI, LLMs and the craft of building intelligent systems — published on Medium.

Medium LangGraph AI Agents

Why LangChain Isn't Enough: 5 Surprising Truths About Building Real-World AI Agents with LangGraph

Most developers reach for LangChain first — but real production AI agents demand something more. Discover the hard-won lessons from shipping LangGraph-powered agents at scale.

Medium Agentic AI Future of Work

Beyond Answering Questions: How Agentic AI is Redefining How We Work

The shift from AI as a question-answerer to AI as an autonomous actor is already underway. Here's what it means for how we build, manage, and collaborate with intelligent systems.

View all writings on Medium →