Objective 1
Data Manipulation & Cleansing
- Univariate outlier detection
- Epistemic errors in thinking about data: the data-reality gap, self-reporting bias, and the black swan fallacy
- Data preprocessing and normalization
- Random data sampling
Objective 2
Statistical Analysis
- Descriptive Data Analysis with NumPy and SciPy: minimum, maximum, mean, and standard deviation
- Hypothesis testing: a test of means, a test of proportions, ANOVA
- Introduction to generalized linear modeling
- Utilize linear algebra with NumPy: invert matrices, compute eigenvalues, solve linear equations, and find determinants
Objective 3
Web Scraping & NLP
- Parse webpages as trees and understanding the DOM
- Web scrape with requests, selenium webdriver, and scrapy
- Preprocessing text, sentiment analysis with textual data
- Parse HTML content with BeautifulSoup
Objective 4
Data Structures for Analysis
- Standard library objects: lists, dictionaries, and user-defined classes
- Multidimensional NumPy arrays: stacking, splitting, converting, views, and masks
- Pandas DataFrames and Series: aligning, aggregating, concatenating, pivoting, and appending
Objective 5
Visualization
- Construction of visualizations with Matplotlib and Seaborn libraries
- Plotting of data in logarithmic, scatter, and box-whisker plots
- Saving charts and plots for future use
- Exploration of correlations and interactions between features with histograms and multivariate charts
Objective 6
Persistent Storage
- The retrieval, processing, and storing data: CSV, serialized, PyTables, HDF5 stores, Excel workbooks, REST services, and JSON, HTML parsing
- Execution of queries on relational databases: select and joins
- Leverage ORMs: SQLAlchemy, Pony, and PyMongo