Objective 1


Data Manipulation & Cleansing

  • Univariate outlier detection
  • Epistemic errors in thinking about data: the data-reality gap, self-reporting bias, and the black swan fallacy
  • Data preprocessing and normalization
  • Random data sampling

Objective 2


Statistical Analysis

  • Descriptive Data Analysis with NumPy and SciPy: minimum, maximum, mean, and standard deviation
  • Hypothesis testing: a test of means, a test of proportions, ANOVA
  • Introduction to generalized linear modeling
  • Utilize linear algebra with NumPy: invert matrices, compute eigenvalues, solve linear equations, and find determinants

Objective 3


Web Scraping & NLP

  • Parse webpages as trees and understanding the DOM
  • Web scrape with requests, selenium webdriver, and scrapy
  • Preprocessing text, sentiment analysis with textual data
  • Parse HTML content with BeautifulSoup

Objective 4


Data Structures for Analysis

  • Standard library objects: lists, dictionaries, and user-defined classes
  • Multidimensional NumPy arrays: stacking, splitting, converting, views, and masks
  • Pandas DataFrames and Series: aligning, aggregating, concatenating, pivoting, and appending

Objective 5


Visualization

  • Construction of visualizations with Matplotlib and Seaborn libraries
  • Plotting of data in logarithmic, scatter, and box-whisker plots
  • Saving charts and plots for future use
  • Exploration of correlations and interactions between features with histograms and multivariate charts

Objective 6


Persistent Storage

  • The retrieval, processing, and storing data: CSV, serialized, PyTables, HDF5 stores, Excel workbooks, REST services, and JSON, HTML parsing
  • Execution of queries on relational databases: select and joins
  • Leverage ORMs: SQLAlchemy, Pony, and PyMongo

Struggling to assess your current and incoming data professionals?