HDDS Data Analytics – HD Data Science

Objective 1

Data Manipulation & Cleansing

Univariate outlier detection
Epistemic errors in thinking about data: the data-reality gap, self-reporting bias, and the black swan fallacy
Data preprocessing and normalization
Random data sampling

Objective 2

Statistical Analysis

Descriptive Data Analysis with NumPy and SciPy: minimum, maximum, mean, and standard deviation
Hypothesis testing: a test of means, a test of proportions, ANOVA
Introduction to generalized linear modeling
Utilize linear algebra with NumPy: invert matrices, compute eigenvalues, solve linear equations, and find determinants

Objective 3

Web Scraping & NLP

Parse webpages as trees and understanding the DOM
Web scrape with requests, selenium webdriver, and scrapy
Preprocessing text, sentiment analysis with textual data
Parse HTML content with BeautifulSoup

Objective 4

Data Structures for Analysis

Standard library objects: lists, dictionaries, and user-defined classes
Multidimensional NumPy arrays: stacking, splitting, converting, views, and masks
Pandas DataFrames and Series: aligning, aggregating, concatenating, pivoting, and appending

Objective 5

Visualization

Construction of visualizations with Matplotlib and Seaborn libraries
Plotting of data in logarithmic, scatter, and box-whisker plots
Saving charts and plots for future use
Exploration of correlations and interactions between features with histograms and multivariate charts

Objective 6

Persistent Storage

The retrieval, processing, and storing data: CSV, serialized, PyTables, HDF5 stores, Excel workbooks, REST services, and JSON, HTML parsing
Execution of queries on relational databases: select and joins
Leverage ORMs: SQLAlchemy, Pony, and PyMongo

Struggling to assess your current and incoming data professionals?