šŸ‘‹ About Me

I’m Jing You, transitioning from hospitality management to data engineering.

My Background:

Why the Transition?

While running my restaurant, I found myself spending countless hours in Excel analyzing customer data and optimizing Facebook ad campaigns. I realized: I love solving problems with data. As an immigrant who came to the US at 16, I’ve always believed that education and skills are the keys to changing one’s destiny. After completing a Data Analytics bootcamp, I decided to go further—to learn Data Engineering.


šŸš€ My Two-Month Learning Journey

Timeline

Tech Stack

Proficient:

Currently Learning:


šŸ› ļø Technical Skills

Programming Languages

Data Engineering

Tools & Technologies

Data Formats

šŸ’¼ Project Portfolio

Project 1: NPPES Medical Provider Data Pipeline

Tech Stack: Python, AWS S3, PostgreSQL, Docker

Processed 8.7 million medical provider records, building a complete batch ETL pipeline.

Key Learnings:

View Project →


Project 2: Weather Data Integration Pipeline ⭐ Latest

Tech Stack: Python, REST API, Pandas, PostgreSQL, DuckDB, Docker

Built a multi-source data integration pipeline that combines real-time weather data from OpenWeatherMap API with PostgreSQL weather station information.

Core Features:

# API Integration with Rate Limiting
@rate_limit(max_per_second=2)
def fetch_weather(city, lat, lon):
    response = requests.get(url, params=params)
    return response.json()

# Data Transformation
merged = weather_df.merge(stations_df, on='city', how='inner')
merged.to_parquet('weather_clean.parquet')

# Data Validation with DuckDB
con.execute("""
    SELECT COUNT(*) as null_count
    FROM weather WHERE temperature_f IS NULL
""")

Project Highlights:

Business Application: This pipeline can analyze weather impacts on restaurant business, such as:


šŸŽÆ From Restaurants to Code: My Unique Perspective

As someone transitioning from hospitality, I’ve found many parallels between the two fields:

Restaurant Management Data Engineering
Ingredient procurement Data ingestion (API + Database)
Food preparation Data cleaning (Pandas)
Recipe standardization Data transformation (Schema design)
Quality control Data validation (DuckDB)
Plating & serving Data delivery (Parquet + Reports)

This background makes me particularly attentive to:


šŸ’” Key Takeaways from Two Months

Technical Side

1. Understanding the Essence of ETL

Extract (Extraction)  → Acquire data from various sources
Transform (Transformation) → Clean, merge, standardize
Load (Loading)     → Store in target location

2. Mastering Data Format Conversion

JSON (raw data)  →  Pandas (processing)  →  Parquet (storage)

Why? Because different stages have different needs:

3. Learning Systems Thinking Not just ā€œmaking code work,ā€ but:

Mindset Shifts

From ā€œCompleting Tasksā€ to ā€œBuilding Systemsā€

From ā€œI Can Use Toolsā€ to ā€œI Understand Principlesā€


🚧 Challenges I’ve Faced

1. Path vs pathlib - The File Path Confusion

Problem: Windows uses \, Mac uses /, code breaks across platforms

Solution: Learning to use pathlib.Path

# Old way (error-prone)
filepath = 'data\\raw\\weather.json'

# New way (cross-platform)
from pathlib import Path
filepath = Path('data') / 'raw' / 'weather.json'

2. Parquet Format - Why Not Use CSV?

Confusion: CSV is so simple, why learn a new format?

Understanding:

In big data scenarios, Parquet saves 90% space and queries 10x faster!

3. Docker Dev Container - Why Containerize?

Question: My local environment works, why Docker?

Understanding:

Dev Container Solution: Everyone uses the same environment!


šŸŽ“ Advantages of a Non-Traditional Background

Many ask me: ā€œCan you learn data engineering without a CS degree?ā€

My answer: Absolutely! And you have unique advantages.

My hospitality background gave me:

āœ… Real Business Understanding

āœ… Problem-Solving Ability

āœ… Customer-Oriented Thinking


šŸ“ˆ Learning Methods I Use

1. Learn Through Projects, Not Just Tutorials

2. Analogy Learning

Use familiar concepts to understand new ones:

3. Document Everything

4. Don’t Fear ā€œDumb Questionsā€

ā€œWhat’s the difference between Path and pathlib?ā€ isn’t dumb—it’s a great question!


šŸ“¬ Contact Me

Currently seeking Data Analyst / BI Analyst positions (Nashville, graduating May 2025)


šŸ’­ Final Thoughts

Two months ago, I knew nothing about ā€œETL.ā€ Today, I can confidently say: I am a data engineer (still learning, of course).

If you’re also from a non-traditional background, if you’re considering a career change to tech, I want to tell you:

The only question is: Are you willing to start?

I started two months ago, and today I have my own tech blog.
Where will you be two months from now?

Let’s connect! I’m happy to share my learning experience and look forward to hearing your story.


Last updated: December 16, 2024