『eStat』 Statistics and Data Science

Author     Preface

[eStat YouTube Channel]

This book introduces elementary statistical analysis and data science using an educational software 『eStat』 designed for students in undergraduate level.
『eStat』 is a web-based freeware for statistics education which can be used anytime and anywhere using PC, tablet, or mobile phone.
Basic Operation of 『eStat』     [pdf]     [Video]

Project Leader: Professor Jung Jin Lee, Soongsil University, Korea, ADA University, Azerbaijan
Contact: jjlee@ssu.ac.kr
This work is in the public domain. Therefore, it can be copied and reproduced without limitation. However, we would appreciate a citation where possible. (http://www.estat.me).

Table of Contents

Chapter 1 Statistics and Data Science    [book]

1.1 Statistics and Data Science
1.2 Population and Sample
1.3 Variables and Data
1.4 Softwares for Statistical Analysis

Chapter 2 Visualization of Qualitative Data    [book]

2.1 Visualization of Qualitative Data
2.2 Visualization of Summary Data
2.3 Visualization of Raw Data
2.4 Word Cloud

Chapter 3 Visualization of Quantitative Data    [book]

3.1 Visualization of Quantitative Data
3.2 Visualization of Single Quantitative Variable
3.3 Visualization of Two Quantitative Variables

Chapter 4 Data Summary with Tables and Measure    [book]

4.1 Frequency Table for Single Variable
4.2 Contingency Table for Two Variables
4.3 Summary Measures for Quantitative Variable

Chapter 5 Probability Distribution     [book]

5.1 Definition of Probability
5.2 Calculation of Probability
5.3 Discrete Random Variable
5.4 Continuous Random Variable

Chapter 6 Sampling Distributions and Estimation    [book]

6.1 Simple Random Sampling
6.2 Sampling Distribution of Sample Means and Estimation of Population Mean
6.3 Sampling Distribution of Sample Variance and Estimation of Population Variance
6.4 Sampling Distribution of Sample Proportion and Estimation of Population Proportion
6.5 Determination of Sample Size

Chapter 7 Testing Hypothesis for Single Population Parameters    [book]

7.1 Testing Hypothesis for a Population Mean
7.2 Testing Hypothesis for a Population Variance
7.3 Testing Hypothesis for a Population Proportion
7.4 Testing Hypothesis with α and β simultaneously

Chapter 8 Testing Hypothesis for Two Population Parameters    [book]

8.1 Testing Hypothesis for Two Population Means
8.2 Testing Hypothesis for Two Population Variances
8.3 Testing Hypothesis for Two Population Proportions

Chapter 9 Testing Hypothesis for Several Population Means    [book]

9.1 Analysis of Variance for Experiments of Single Factor
9.2 Experimental Design for Sampling
9.3 Analysis of Variance for Experiments of Two Factors

Chapter 10 Nonparametric Testing Hypothesis    [book]

10.1 Nonparametric Test for Location Parameter of Single Population
10.2 Nonparametric Test for Location Parameters of Two Populations
10.3 Nonparametric Test for Locations Parameters of Several Populations

Chapter 11 Testing Hypothesis for Categorical Data    [book]

11.1 Goodness of Fit Test
11.2 Testing Hypothesis for Contingency Table

Chapter 12 Correlation and Regression Analysis    [book]

12.1 Correlation Analysis
12.2 Simple Linear Regression Analysis
12.3 Multiple Linear Regression Analysis

Chapter 13 Time Series Analysis    [book]

13.1 What is Time Series Analysis?
13.2 Smoothing of Time Series
13.3 Transformation of Time Series
13.4 Regression Model and Forecasting
13.5 Exponential Smoothing Model and Forecasting
13.6 Seasonal Models and Forecasting

Authors of eBook and developers of 『eStat』

Jung Jin Lee
Emeritus Professor, Soongsil University, Korea
Professor, ADA University, Azerbaijan
B.S. and M.S., Seoul National University
Ph.D., Case Western Reserve University
President, Korean Statistical Society
Vice President, International Association for Statistical Computing
Council Member, International Statistical Institute (ISI)

Tae Rim Lee
Emeritus Professor, Korea National Open University
B.S. and M.S., Seoul National University
Ph.D., Choongang University
Vice President, Korean Statistical Society
Vice President, International Association for Statistics Education
Vice President, International Biometric Society

Geunseog Kang
Emeritus Professor, Soongsil University, Korea
B.S. and M.S., Seoul National University
Ph.D., University of Wisconsin - Madison

Sung Soo Kim
Professor, Korea National Open University
B.S., M.S., Ph.D., Seoul National University

Heon Jin Park
Professor, Inha University
B.S. and M.S., Seoul National University
Ph.D., Iowa Stat University
President, Korean Data Mining Society

Song Yong Sim
Hallym University
B.S. and M.S., Seoul National University
Ph.D., University of Wisconsin - Madison

Yoon Dong Lee
Professor, Sogang University
B.S. and M.S., Seoul National University
Ph.D., Iowa State University

Hyun Jo You
Professor, Seoul National University
B.S., M.S., Ph.D, Seoul National University
Ph.D. Soongsil University

Preface

Over the last half century, Computer Science has been evolved at a tremendous rate, bringing about previously unimaginable changes in many areas of our society and enriching our lives. Recent merging of Computer Sciences with Communication Technologies has created a digital revolution called the 4th industrial revolution that will lead for another future change.

The 4th Industrial Revolution aims at super-connectedness, super-intelligence and super-forecasting and many new changes will occur in our lives revolutionary. The revolution would help us to solve many problems, but it would also give us new challenges to be solved at the same time. The biggest challenge is analysis and utilization of Big Data.

The analysis of Big Data can be done by multi-disciplinary areas such as Statistics, Computer Science and Management etc which is called Data Science. Data Science is primarily based on traditional statistical methods which have been applied now in almost all fields of study such as natural science, engineering, medicine, agriculture, economics, business administration, and sociology etc.

Data Science requires lots of data manipulation using computer software. However, statistical softwares such as R, SAS and SPSS which are widely used require some training from professionals. Authors of this book have been developed 『eStat』 for years which can help all level of students to learn Statistics easily and to introduce Data Science by using the software.

This book introduces basic statistical methods which are widely used in many areas and enables user to practice the methods using 『eStat』. We hope this book will serve as a useful primer for those interested in Statistics and Data Science.

Part I (Chapter 1 through 4) describes how to visualize data and summarize data using 『eStat』. This part can be easily understood and utilized by secondary school students. Part II (Chapter 5 and 6) describes probabilities, probability distribution functions, and estimations. Part III (Chapter 7 to 11) describes testing hypothesis for parameters of single and multiple populations. Chapters 7 to 9 describe traditional parametric hypothesis tests, Chapter 10 describes non-parametric hypothesis tests, and Chapter 11 describes hypothesis tests of categorical data. Part IV (Chapter 12) describes correlation and regression analysis.

I appreciate all of you who have developed 『eStat』 together over the past few years. I appreciate also to all internet communities who have helped us during the development of『eStat』. Special thanks should be given to Ms. Kamala Omarova for her careful reading of this manuscipt.

Spring 2020

Project Leader: Jung Jin Lee