Hi, I'm David

Data Analyst | Data Scientist
with Data Engineering expertise

PythonSQLETLBIAnalytics

I design analytical systems that transform complex data into reliable insights, enabling faster, smarter decision-making and measurable business impact.

About Me

From Logic to Insights

I am a Mathematics Educator turned Data Professional, specializing in building analytical systems that transform raw data into strategic intelligence. Over eight years as an Academic Head, I led data-driven initiatives across 19 institutions—optimizing resource allocation and reducing reporting cycles from days to hours.

Backed by an MSc in Data Science and specialized training in Data Engineering, I bridge the gap between complex analytics and clear communication. I design automated data pipelines using Python and SQL, develop interactive dashboards, and apply statistical techniques to build predictive models.

Technical Toolbox

Python R SQL (Postgres/MySQL) Azure Git Scikit-Learn Statsmodels Power BI Tableau Excel

How I Add Value

  • ETL & Data Wrangling: Integrating diverse sources (APIs, SQL, scrapers) into clean, automated, and reliable datasets
  • Diagnostic Analytics: Uncovering patterns and anomalies to ensure data integrity and drive strategic decisions
  • BI & Visual Intelligence: Transforming KPIs into interactive dashboards using Power BI, Tableau, and Looker Studio
  • Statistical Modeling: Leveraging regression and classification to solve business problems and forecast trends
SKILLS

Strategic Value Proposition

Bridging the gap between complex mathematical theory and actionable business intelligence. I specialize in building robust data pipelines and translating raw metrics into strategic narratives.

Algorithm Validation: Applying statistical rigor to validate ML outputs and ensure mathematical integrity.

ETL & Automation: Architecting automated workflows using SQL and Python to streamline ingestion and processing.

Data Storytelling: Communicating technical findings to non-technical stakeholders with clarity.

Data Engineering

Python SQL (SQLite/PostgreSQL) Pandas & NumPy ETL Pipelines Web Scraping

Analytics & ML

Scikit-Learn Predictive Modeling Hypothesis Testing Data Visualization Power BI / Tableau

Mathematics

Statistical Inference Linear Algebra Calculus Probability Theory Optimization

Professional Competencies

Statistical Reasoning
Programming & Data Handling
Problem Formulation
Model Evaluation
Insight Communication
Selected Projects

End-to-end work across analytics, dashboards, and predictive modeling: a mix of data analysis, engineering pipelines, and statistical modeling.

Data Analysis

Airlines Sales & Operations

Analyzed 10k+ flight records to identify revenue leakage and optimize route scheduling based on delay patterns.

Key Finding: Identified July as the peak revenue window (63% of sales) and flagged Moscow-Kazan routes for high delay risks.
Data Source: Kaggle
SQL Python CTEs Looker Studio
View Case Study→
Data Science

E-Commerce Recommendation Engine

Designed a Hybrid Collaborative Filtering engine to solve the product discovery challenge. Processed 100k+ orders to generate personalized Top-N product rankings based on price and category affinity.

Key Outcome: Built a high-integrity dataset from 9 relational tables (95K+ valid transactions) and implemented a ranking engine for real-time personalization.
Data Source: Olist / Kaggle
Python Pandas (ETL) Scikit-Learn Collab. Filtering
View Case Study →
Business Intelligence

Sprocket Central: Growth Strategy

Leveraged RFM analysis and predictive modeling to prioritize 1,000 new prospects. The goal was to move beyond descriptive analytics to target customers with the highest Customer Lifetime Value (CLV).

Impact: Isolated 382 high-value customers projected to drive 29,000+ unit sales, identifying NSW as the dominant region (56%).
Data Source: Sprocket Central
Power BI DAX RFM Modeling Predictive Scoring
View Case Study →
Data Engineering

Multi-Source Data Ingestion

Engineered an automated ETL pipeline to synchronize fragmented data from three distinct sources: Twitter's REST API, cloud-hosted TSVs, and archival CSVs, solving complex "Schema-on-Read" challenges.

Architecture: Implemented a rigorous "Assess-Clean-Test" workflow using Python RegEx and Tweepy to parse unstructured text and ensure data integrity.
Data Source: Twitter API / Raw Files
Python Tweepy Requests RegEx
View Case Study →
Education

MSc. Data Science Current

Cooperative University of Kenya | Jul 2025 - Dec 2026

Specializing in advanced machine learning, predictive modeling, and big data architecture.

Data Engineering Specialization In Progress

ALX Africa | Sep 2025 - Jul 2026

Mastering distributed systems, ETL orchestration with Airflow, and cloud data warehousing.

B.Ed Science (Statistics Major)

Kenyatta University | 2006 - 2010

Strong foundation in probability distributions, statistical theory, time series analysis, regression models, quantitative analysis, and hypothesis testing.

Technical Articles

Architecting Reliability in Distributed Systems

David Owino Nov 5, 2025

The most common point of failure in modern data pipelines is not application code, but unannounced upstream schema changes.

Data contracts establish a formal agreement between data producers (software engineers) and data consumers (data engineers), ensuring consistency, compatibility, and pipeline stability.

Engineering principle: data Contracts shift data quality left, transforming it from a reactive cleanup task into a proactive, pre-deployment requirement.

The End of Silent Failures

By introducing a contract layer - using technologies such as Protobuf or JSON Schema - breaking changes are detected before they reach the data lake. Any attempt to remove or alter a field fails fast within the CI/CD pipeline, preventing downstream disruptions.

Read Full Article →

Beyond the Black Box: The Art of Feature Engineering

David Owino Dec 28, 2025

Feature engineering-not model complexity-drives machine learning performance, enabling simple models to outperform complex ones when data quality is high.

Perspective: In production, transforming raw data into meaningful signals and prioritizing model explainability (XAI) often delivers more business value than marginal gains in accuracy.

ML Lifecycle Workflow
  • Feature Extraction Scikit-Learn
  • Model Selection XGBoost
  • Hyperparameter Tuning & CV Optuna
  • Explainability SHAP / LIME
Read Full Article →
Get In Touch

Contact Details

Open to discussions on Data Engineering pipelines, ML research, or freelance analytics.