Skip to content
Back to Projects

Clinia

Clinia — Data Acquisition Platform

Data Acquisition Intern
C# .NET TypeScript Apache Airflow JSON

Highlights

  • Developed and maintained data connectors for medical clinic information using C#
  • Built web crawlers with TypeScript to extract medical journal datasets for ML
  • Enhanced BeActive website and Slack bot with Strava integration
  • Performed data QA using JSON Viewer and Apache Airflow

Overview

At Clinia, I worked on the data acquisition team building the infrastructure that feeds their healthcare search platform. The role involved designing ETL pipelines that collect, transform, and validate medical clinic data from various sources across Canada.

Technical Highlights

ETL Pipelines (C# / .NET)

Built data connectors that pull medical clinic information from provincial health databases, standardize the data format, and load it into Clinia’s data lake. Each connector handled different data formats and validation rules.

Web Crawlers (TypeScript)

Developed crawlers to extract medical journal datasets used for training machine learning models. These crawlers needed to handle rate limiting, pagination, and varying HTML structures across different journal websites.

BeActive & Slack Bot

Enhanced the company’s internal wellness platform (BeActive) and its Slack bot integration with Strava. Added new features for tracking team fitness challenges and fixed existing bugs in the activity sync pipeline.

Data Quality Assurance

Validated data ingestion into the company’s data lake using Apache Airflow for pipeline monitoring and JSON Viewer for manual data inspection.