Working on integrating a large-scale data warehouse into GCP BigQuery, the enterprise data platform, which includes migrating 1.5M active members’ records, 2.6M historical members’ data, and 80M+ medical claims.
Designing and implementing scalable data staging and preliminary transformation pipelines using AbInitio to convert data into an enterprise-compatible schema and format, loading the data into the on-premise IBM DB2 data warehouse for downstream analytics like HEDIS reporting.
Developing functional, reusable dbt models hosted on GCP Compute Engine managed instance groups for schema transformation and data processing, reducing processing time by 35% and enhancing scalability, with source as IBM DB2 and sink as BigQuery.
Developed and maintained 18 enterprise-level full-stack web applications on the J2EE platform using Angular, Java, and Spring Boot based on stakeholder requirements in an Agile environment.
Created RESTful APIs using test-driven development while supporting CI/CD using GitLab owing to the migration of infrastructure to the cloud, resulting in a 14% decrease in overhead.
Successfully executed the migration of applications and databases from on-premise to AWS, using Docker, AWS Managed Workflows, AWS RDS, and Amazon EC2, reducing server costs by 18%.
Developed ETL pipelines using Spark Streaming and Airflow, driving a 26% reduction in data processing time and enhancing overall data pipeline performance.
Implemented automation solutions for extracting and loading data from diverse sources such as PDF, Excel, JSON, PostgreSQL, and MongoDB, streamlining data ingestion processes and ensuring data integrity.
Played a key role in optimizing SQL queries for downstream operations, resulting in a notable 30% improvement in query execution times, enhancing data retrieval efficiency and overall system performance.
Created a consolidated auto insurance management dashboard using Flask and MySQL to enhance operational efficiency and optimize workflow processes.
Collaborated with cross-functional teams to integrate the dashboard with APIs, reducing latency by 40%.
Trained supervised Machine Learning models like RF, SVM, and GBM to detect fraudulent claims and integrated them with the dashboard, reducing detection time by 27% and improving accuracy by 4%.
Developed a Deep Learning model, using Python and TensorFlow, trained on the Twitter dataset to flag inappropriate content.
Utilized Hadoop’s MapReduce architecture with General Purpose GPU and PyCUDA for extensive parallelization, leading to ∼12% reduction in processing time.
Download my CV for my detailed work experience as well as links to my publications and projects!