👋 Hi, I’m Jing You
| **Data Engineer · Analytics Engineer · BI Developer | Python · SQL · dbt · Snowflake · Power BI** |
I build scalable, production-grade data pipelines and analytics solutions — from raw ingestion to executive dashboards. My background blends 10+ years of business operations with modern data engineering, giving me a unique edge in solving real-world data problems.
📍 Nolensville, TN · Open to Nashville or Remote roles
⚡ Core Skills
Data Engineering
- End-to-end ELT pipeline development (batch & streaming)
- Medallion architecture (Bronze / Silver / Gold)
- Data modeling, transformation & quality assurance
- REST API integration & multi-source ingestion
- Orchestration, scheduling & monitoring
Technologies
- Languages: Python (Pandas, Polars, Requests), SQL
- Transformation: dbt, SQLMesh
- Warehouses: Snowflake, DuckDB, PostgreSQL
- Orchestration: Apache Airflow, Prefect
- Streaming: RabbitMQ, WebSocket
- Cloud: AWS (S3, Lambda), Azure Databricks
- BI & Visualization: Power BI (Advanced DAX), Metabase, Streamlit
- Tools: Docker, Git, GitHub Actions, VS Code Dev Containers
- Certifications: Advanced Power BI Certificate · Google AI Essentials · DP-600 Microsoft Fabric (in progress)
💼 Featured Projects
⭐ MidTenn Lend Map — Small Business Lending Intelligence Platform
Capstone · Nashville Software School Data Engineering Apprenticeship
Tech Stack: Python · SQLMesh · Prefect · DuckDB · Snowflake · Power BI · Metabase · 5 Public APIs
End-to-end data engineering platform surfacing county-level small business lending gaps across Middle Tennessee — built to help community banks identify underserved markets.
Engineering Highlights
- Multi-source ingestion across SBA, FDIC, CFPB, FRED, and U.S. Census APIs
- 19 SQLMesh models orchestrated with Prefect through full medallion architecture (Bronze → Silver → Gold)
- 2,300+ loan records + 52,000+ complaint records processed and validated
- Dual serving layers: Snowflake + Power BI (executive dashboards) and PostgreSQL + Metabase (operational monitoring)
- Geospatial visuals identifying underserved lending regions across Middle Tennessee counties
📌 Crypto Market Streaming Pipeline
Tech Stack: Python · Coinbase WebSocket · RabbitMQ · PostgreSQL · dbt · DuckLake · Metabase · Apache Airflow
A real-time streaming data pipeline ingesting live cryptocurrency market data from Coinbase and transforming it into analytical dashboards.
Architecture
Coinbase WebSocket → RabbitMQ → PostgreSQL → dbt → DuckLake + Metabase
Airflow (hourly orchestration)
Engineering Highlights
- Real-time WebSocket ingestion from Coinbase market feed into RabbitMQ message queue
- PostgreSQL as the core data warehouse with dbt transformations for clean analytical models
- DuckLake lakehouse layer for cost-efficient analytical queries
- Metabase dashboards for live market monitoring
- Apache Airflow orchestrating hourly dbt runs for fresh reporting
📌 NPPES Healthcare Provider Pipeline
Tech Stack: Python · dbt · Snowflake · Apache Airflow · AWS S3 · Docker · Medallion Architecture
A production-grade ELT pipeline processing 8.85 million CMS provider records (9.9 GB raw data) end-to-end.
Highlights
- Full medallion architecture with schema validation, null/duplicate detection, and root cause analysis at each layer
- 16+ automated dbt data quality tests with CI/CD via GitHub Actions
- Apache Airflow orchestration with structured logging for full traceability
- Dimensional modeling with star schema output layer optimized for BI consumption
- Power BI dashboards with DAX measures surfacing provider distribution insights
📌 COVID-19 Public Health Pipeline
Tech Stack: Python · Apache Airflow · Snowflake · dbt · Streamlit · Medallion Architecture
A fully orchestrated public health data pipeline with an interactive Streamlit dashboard for non-technical users.
Highlights
- Batch pipeline through Bronze / Silver / Gold layers with dbt transformations
- Airflow scheduling and monitoring for reliable daily runs
- Streamlit dashboard enabling stakeholders to explore metrics without SQL access
- Data quality validation at each medallion layer before promotion
📌 Brazilian E-Commerce Analytics Dashboard
Tech Stack: Power BI · DAX · Star Schema · Python · PostgreSQL
Interactive analytics dashboard built on the Olist Brazilian e-commerce dataset — published to Power BI Service with a public link.
Highlights
- Star schema data model connecting orders, customers, products, and sellers
- Advanced DAX measures for revenue trends, delivery performance, and seller ratings
- Executive-level dashboard published to Power BI Service
- Python + PostgreSQL ETL for data preparation and loading
🧠 How I Think About Data
Systems Over Scripts
I design pipelines that are maintainable, scalable, observable, and documented — not just code that runs once.
Data Quality First
Bad data = bad decisions. I prioritize validation, schema consistency, and reproducibility at every layer.
Business-Aware Engineering
10+ years in operations, e-commerce, and client-facing roles means I understand real business workflows, identify high-impact opportunities, and communicate clearly with non-technical stakeholders.
📬 Contact
Open to Analytics Engineer / Data Engineer / BI Developer roles — Nashville or Remote
- 📧 jingliuyou@gmail.com
- 🐙 GitHub
- 📍 Nolensville, Tennessee