Python Quant · Module One
Python Data Research Fundamentals
Use Python to clean data, calculate fields, and prepare reproducible research outputs.
PYTHON FUNDAMENTALS
Four core skills
Data loading
Read tables and inspect missing values.
Cleaning
Align formats, dates, and numeric fields.
Feature calculation
Create indicators and derived fields.
Output
Save tables, charts, and logs for review.
Data loading
Data loading is the first quality-control step in a Python research notebook. Learners practice importing structured files, checking column names, confirming date order, and identifying missing or duplicated rows before any calculations are made.
A reliable notebook should make the data source, time range, frequency, and field definitions easy to inspect. This helps prevent later analysis from relying on unclear assumptions or hidden manual edits.
Cleaning
Cleaning focuses on making data consistent enough for educational analysis. Common tasks include converting dates, standardizing numeric fields, removing duplicate records, and flagging unusual values for review rather than silently deleting them.
The goal is not to make a dataset look perfect. The goal is to document what was changed, why it was changed, and what limitations remain after the cleaning process.
Feature calculation
Feature calculation turns raw fields into descriptive research variables, such as rolling summaries, percentage changes, range measures, or category labels. Learners practice writing these calculations in a way that can be repeated and checked.
Each derived field should have a clear name and a short explanation. This keeps later research discussions focused on what the data shows, rather than on unclear spreadsheet-style formulas.
Output
Outputs should be saved in a form that another learner can review: cleaned tables, summary charts, and short notes explaining the workflow. The module emphasizes simple file organization so results can be traced back to the original dataset.
This section remains general education. Outputs are used to support learning and review, not to provide personal advice, product recommendations, or promised outcomes.
WORKFLOW
A clean notebook sequence
Import
Load the dataset and record source notes, date range, and fields included.
Inspect
Check missing values, duplicate rows, ordering, and inconsistent field types.
Transform
Create derived fields with clear labels and keep assumptions visible.
Export
Save cleaned outputs and write a short summary for later review.
EXPECTED OUTPUTS
What learners should leave with
Reproducible file
A notebook that can reload data, rerun cleaning steps, and recreate the same descriptive outputs.
Data quality notes
A short record of missing values, transformations, and any limitations that should be considered in later modules.
Plain-language explanation
A concise explanation of what each calculated field means and how it should be interpreted in a study context.