In this chapter, we introduce graphs to visualize qualitative data such as bar graph, pie chart,
band graph, and line graph in Section 2.1.
In Section 2.2, we discuss visualization of summary data of single categorical variable using 『eStat』. Visualization of summary data of a categorical variable by a group is also discussed.
In Section 2.3, we discuss visualization of raw data of a categorical variable using 『eStat』. Visualization of raw data of a categorical variable by a group is also discussed.
Data of gender for students in a classroom, which are either a male or a female, are referred to as qualitative data.
Data of marital status for employees in a company, which are either single or married, are also qualitative data.
Bar graph, pie chart, band graph, and line graph are used to visualize the qualitative data. These graphs are
frequently used as an exploratory data analysis of the qualitative data.
A bar chart (or bar graph) is a graph that presents the qualitative data with rectangular bars in a way that their heights (or lengths) are proportional to frequencies of their categories. Therefore, the frequencies of all categories in a categorical variable can be easily compared by watching the heights (or lengths) of the rectangular bars. We usually put some space between the rectangular bars to emphasize that they represent the distinct categories of a variable.
The rectangular bars of the bar chart can be plotted either vertically or horizontally. One axis of the chart shows all categories of a variable, and the other axis represents the frequencies of each category. If the frequency of each category is represented as a vertical height of a bar drawn up and down in the bar graph, it is called a vertical bar graph. A bar can also be drawn left and right whose length is proportional to the frequency of each category and it is called a horizontal bar graph.
A bar graph can be drawn after counting frequencies of all categories of a variable. If there is another categorical variable, frequencies of all categories of the first categorical variable can be counted for each category of the second categorical variable. For example, we can count the number of single and married employees for both a male and a female category. We can draw two bar graphs of the marital status for both the male and the female categories so that both graphs have the same scale of Y-axis to compare the frequencies of the male category with the frequencies of the female category easily. This graph is called a separated bar graph of the marital status by gender variable. In this case, the gender variable is called a group variable and the marital status is called an analysis variable.
If a variable is analysed by using a group variable, there are many variants of bar graphs which compare visually well all categories of the group variable. A stacked bar graph divides a single bar, which represents the frequency of a category of the analysis variable, into pieces with different colors which are proportional to the frequency of each category of the group variable. A ratio bar graph draws that all bars (rectangles) of each category of the analysis variable have the same height and divides each bar into pieces with different colors which are proportional to the frequencies of each category of the group variable. A side by side bar graph is that in each category of the analysis variable rectangular bars of all categories of the group variable are drawn side by side ways for comparison using the same scale. If there are only two categories of the group variable, a two-sided bar graph (or a bi-lateral bar graph) can be used which draws bars of one category of the group variable
in one side and bars of the other category of the group variable in the opposite direction. The direction can be either the left and right side of the Y-axis or the above and below of the X-axis.
A pie chart is a graph that shows frequencies of all categories of the analysis variable by dividing a pie (circle) into pieces with different colors depending on angle which is proportional to the frequency of each category. We usually draw the largest piece of category in a clockwise order starting from 12 oclock so that the ratio can be compared well.
A doughnut chart which removes a center circle of the pie chart can also be used.
A band graph is similar to the ratio bar graph that shows frequencies of all categories of the analysis variable by dividing a rectangle into square pieces with different colors which are proportional to frequencies of all categories. It is also similar to the pie chart. The square pieces can be sorted in descending order by the frequencies of each category, but 『eStat』 draw the square pieces in the order of category values of a categorical variable.
A line graph shows frequencies (or values) of all categories of an analysis variable in a two-dimensional graph. The X-axis shows names of categories and the Y-axis represents the scale of frequencies (or values) of all categories. Each pair of the values, the category name and its frequency, is marked as a point in a two-dimensional coordinate plane and two adjacent points are connected with a line. The line graph may be similar to the vertical bar graph which connects only top centers of each bar. The line graph is usually used to visualize time dependent data to watch its trend over time. For example, the yearly amount of export in a country can be visualized using the line graph.
Graphs for Qualitative Data
Bar chart (or bar graph) is a graph that shows qualitative data with rectangular bars with heights or lengths proportional to frequencies of their categories.
Pie chart is a graph that shows frequencies of all categories of an analysis variable by dividing a pie (circle) into pieces with different colors depending on angle which is proportional to the frequency of each category.
Band graph is similar to the ratio bar graph that shows frequencies of all categories of an analysis variable by dividing a rectangle into square pieces with different colors which are proportional to frequencies of all categories.
Line graph shows frequencies (or values) of all categories of an analysis variable in a two-dimensional graph.
This chapter discusses how qualitative data are visualized using 『eStat』 by dividing the case of summary data (Section 2.2) and raw data (Section 2.3) and by dividing the case of single analysis variable and the case of an analysis varible with the group variable.
This data is called the raw data of the gender variable which is a categorical variable.
If you counted the number of ‘male’ students and ‘female’ students in the above raw data and reported the result as shown in Table 2.2.1.
Table 2.2.1 Summary data of the gender in a class
Gender
Students
Male
6
Female
4
This data is called the summary data of the gender variable.
If the number of data increases, counting the number of cases in each category from the raw data
of a categorical variable in order to make the summary data is not an easy task. One of the important
functions of a statistical package is to organize the raw data into the summary data by counting the number of cases
in each category. Because of difficulty of generating the summary data from the raw data,
governmental institutions usually provide statistics of a census to the public
in the form of the summary data such as the number of population by gender or
the number of population by region. These summary data can be downloaded from the governmental home page as an Excel file.
An Excel file can be saved as a text file in CSV (comma separated value) format (refer <Figure A.2.6> in Appendix A) which can be loaded by 『eStat』 for data processing and analysis (refer Appendix A).
This section discusses visualization of the summary data of a categorical variable which can also be found in textbooks of an elementary, a middle school, a high school and in governmental publications.
Example 2.2.1(Gender Summary Data)
Enter the summary data of Table 2.2.1 to the sheet of 『eStat』 and save it as a file in CSV format. Using this data, draw a bar graph, a pie chart and a band graph with 『eStat』. Analyze the graphs and prepare a report using the MS Word (or any word processor you prefer).
Answer
Enter the data of Table 2.2.1 to the sheet of 『eStat』 as in <Figure 2.2.1> and enter a variable name of V1 as ‘Gender’ and of V2 as ‘Number’ using [Edit Var] button located above the sheet (refer Appedix A.2).
<Figure 2.2.1> Data input in 『eStat』
Click the first variable name 'Gender' and then the second variable name 'Number'.
Selected variables will be appeared in the box of the 'Selected Var' located above the sheet.
You can select the variable ‘1: Gender’ using the combo box of the 'Analysis Var'
and the variable ‘2: Number’ using the combo box of the 'By Group' located above the sheet as shown in <Figure 2.2.1>.
When variables are selected, a vertical bar graph which is the default graph of 『eStat』 is drawn as in <Figure 2.2.2>.
The height of each bar (rectangle) is proportional to the frequency of each category in the gender variable and therefore the frequencies of both the male and the female categories can be easily compared by watching the heights of bars. The bar graph shows that the number of male students is larger than the number of female students.
A vertical bar graph which draws bars up and down as in <Figure 2.2.2> is widely used,
but a horizontal bar graph which draws bars from left to right is often used if there are many categories.
By clicking on the horizontal bar graph icon located above the Graph Area, a horizontal bar graph
as in <Figure 2.2.3> will be appeared in the Graph Area. By checking the ‘Frequency’ box located below the graph,
the frequency of each bar. will be displayed.
<Figure 2.2.2> Vertical bar graph of the number of male and female students.
<Figure 2.2.3> Horizontal bar graph of the number of male and female students.
By clicking the ‘Graph Save’ icon located above the Graph Area, the current graph of the Graph Area will be saved with a file name ‘eStatGraph.png’ which is shown at the bottom left corner of the main screen as in <Figure 2.2.4> (Refer Appendix A.4).
<Figure 2.2.4> Graph is saved by clicking the ‘Graph Save’ icon
The location of the saved graph file is the download folder specified in your computer system. If you save another graph, eStatGraph(1).png will be created in the download folder. Number in parentheses of the file name will be increased whenever you save a new graph.
You can copy this graph file from the download folder and paste to the MS Word as in <Figure 2.2.5>. You can also write comments about the graph if necessary.
<Figure 2.2.5> Copied graph file of 『eStat』 to MS Word
Click on the pie chart icon
to display a pie chart as in <Figure 2.2.6>
and click on the doughnut graph icon
to display a doughnut graph as in <Figure 2.2.7> which is a pie chart but a small middle circle is cut off.
The pie chart shows frequencies of the number of both male and female students by dividing a pie (circle) into pieces
with two colors depending on angles which are proportional to the frequencies of each category.
<Figure 2.2.6> Pie chart of the number of male and female students.
<Figure 2.2.7> Doughnut chart of the number of male and female students.
Click on the band graph icon
to display a band graph as in <Figure 2.2.8>. A band graph is a variant of the pie chart
by dividing a rectangle into square pieces which are proportional to frequencies of each category. It is named after a rectangular shape with multiple square pieces which looks like a band.
<Figure 2.2.8> Band graph of the number of male and female students.
International institutions such as UN, OECD and EU release their statistics to the public in the form of summary data and this data can be downloaded as an Excel file or a text file in CSV format. The following example shows how to download a file from the OECD and how to draw graphs using this file.
Example 2.2.2(Life Expectancy at Birth : Source OECD)
From the home page of the OECD, https://www.oecd.org, download a data file of the life expectancy at birth.
Copy the columns of the country name and 2017 data located at the last column to 『eStat』 system and save it as a file in CSV format.
Using this data, draw a vertical bar graph and a horizontal bar graph in descending order of the life expectancy. Analyze the graphs.
Answer
The main screen of the OECD website as of December 2020, https://www.oecd.org, looks like as in <Figure 2.2.9>.
<Figure 2.2.9> OECD home page
Select the menu Topics > Health, then the screen as in <Figure 2.2.10> will be appeared.
<Figure 2.2.10> OECD ‘Topic’ > ‘Health’ menu
If you click on ‘Explore all our data on health’, the screen as in <Figure 2.2.11> will be appeared.
<Figure 2.2.11> OECD Statistics for life expectancy at birth
If you click on ‘> OECD Health Statistics 2020: Frequently Requested Data’, an Excel file of
OECD-Health-Statistics-2020-Frequently-Requested-Data.xls is downloaded. If you open the Excel file,
the menu as in <Figure 2.2.12> is appeared.
<Figure 2.2.12> OECD Statistics for life expectancy at birth
If you click on ‘Life expectancy at birth, total population’ in Health status (Mortality), an Excel file as in <Figure 2.2.13> will be appeared on the screen.
<Figure 2.2.13> OECD Statistics for life expectancy at birth
The easiest way to make a file in CSV format is to copy the country name to the first column of the sheet of 『eStat』 and the column of 2017 data located at the last column of this Excel file to the second column of the sheet of 『eStat』 as in <Figure 2.2.14>. After you provide variable names ‘Country’ and ‘Years’ by using [Edit Var] of 『eStat』 , save the data as a file, for example, ‘EX020202_OECD_LifeExpectancy.csv’ in CSV format,.
<Figure 2.2.14> OECD life expectancy at birth in 2017
Another way is to edit the Excel file in <Figure 2.2.13> with only two columns, the country name and 2017 data similar to <Figure 2.2.14>, and save it as a file in CSV format. In this case, the first row should have variable names such as ‘Country’ and ‘Number’ (refer Appendix A.2).. In order to save this file in CSV format, select the Excel menu ‘File’ > ‘Save As’, then a dialogue box as in <Figure 2.2.15> will be appeared. Select the option ‘CSV Utf-8’, then the file will be saved in CSV format in the download folder of your computer. Note that, if you are using an European version of Excel, you have to change the delimiter of semicolon ‘;’ with comma ‘,’ before you save the file (refer Excel option).
<Figure 2.2.15> OECD
Click the variable names ‘Country’ and ‘Number’ on the sheet of 『eStat』 , then a vertical bar graph of the life expectancy will be appeared as in <Figure 2.2.16>. If the characters of the country name are too small to see, you can enlarge the screen by holding the [Ctrl] key and rolling up the wheel mouse. You can click on the horizontal bar graph icon located above the Graph Area to draw a horizontal bar graph as in <Figure 2.2.17>.
It is sometimes convenient to compare data using a horizontal bar graph after sorting. If you check a sorting option ‘Descending’ located below the graph, a horizontal bar graph sorted by descending order of the life expectancy at the birth will be appeared as in <Figure 2.2.17>. It is easy to check that Japan is the longest life expectancy, Switzerland is the second and Latvia is the shortest.
<Figure 2.2.16> Vertical bar graph of OECD life expectancy at birth in 2017
<Figure 2.2.17> Horizontal bar graph of OECD life expectancy at birth, 2017
Practice 2.2.1(Alcohol Expenditure: OECD)
Draw a bar graph using the following data in 『eStat』 system and analyze the graph.
2.2.2 Summary Data of Categorical Variable with Group
The summary data as in Table 2.2.1 can be easily extended if you survey the gender of two classes in a school as in Table 2.2.2. It is the summary data of the gender variable for two classes (groups), classes of 5-1 and 5-2. In this case, we usually want to compare the summary data between two classes (groups) using graphs as the following example.
Table 2.2.2 Summary data of two classes
Gender
5-1
5-2
Male
16
12
Female
14
18
Example 2.2.3(Gender Summary Data of Two Classes)
A file of the summary data in Table 2.2.2 is saved at the following location of 『eStat』 system.
[Ex] ⇨ eBook ⇨ EX020203_Summary_StudentByGender
Using this data, draw a bar graph, a pie chart and a band graph. Use 『eStat』.
Answer
If you load the data file from 『eStat』 , it looks like as in <Figure 2.2.18>.
<Figure 2.2.18> Load file of summary data
Click the variable names 'Gender', ‘5-1’ and ‘5-2’ sequentially, then the selected variables will be appeared at the box of 'Selected Var' located above the sheet. You can select the variable ‘1: Gender’ using the combo box of the 'Analysis Var' and the variable ‘2: 5-1’ and ‘3: 5-2’ using the combo box of the 'By Group' located above the sheet.
When the variables are selected, a vertical bar graph
which is the default graph of 『eStat』 is drawn
using the number of male and female students in both classes as in <Figure 2.2.19>. A bar graph is drawn for each class and the heights of bars are the frequencies of male and female students. Two bar graphs has the same scale of Y-axis and therefore the frequencies of each class can be easily compared. This bar graph is called a separated vertical bar graph for each class.
By clicking the horizontal bar graph icon
,
a separated horizontal bar graph can be drawn as in <Figure 2.2.20>
<Figure 2.2.19> Separated vertical bar graph of the gender distribution by class.
<Figure 2.2.20> Separated horizontal bar graph of the gender distribution by class.
For the summary data of two groups, there are many variants of showing bar graphs
in order to compare two groups visually well.
If you click on the stacked bar icon either vertical
or horizontal
,
a stacked bar graph is drawn that divides a single bar into pieces with different colors which are proportional to the frequencies of male and female students (<Figure 2.2.21> and <Figure 2.2.22>).
<Figure 2.2.21> Stacked vertical bar graph of the gender by class
<Figure 2.2.22> Stacked horizontal bar graph of the gender by class
If you click on the ratio bar graph icon either vertical
or horizontal
,
a ratio bar graph is drawn in which bars with the same height are divided into pieces with different colors
which are proportional to the frequencies of male and female students (<Figure 2.2.23> and <Figure 2.2.24>).
<Figure 2.2.23> Ratio vertical bar graph of the gender by class.
<Figure 2.2.24> Ratio horizontal bar graph of the gender by class.
If you click on the side-by-side icon either vertical
or horizontal
,
a side-by-side bar graph is drawn which draws the bars of each group category sideways for comparison (<Figure 2.2.25> and <Figure 2.2.26>).
<Figure 2.2.25> Side-by-side vertical bar graph of the gender by class.
<Figure 2.2.26> Side-by-side horizontal bar graph of the gender by class.
If there are only two categories of the group variable like this example, then by clicking on the bi-lateral bar icon
either vertical
or horizontal
,
a two-sided (or bi-lateral) bar graph is drawn which draws the bars in the opposite direction either the above and below of X-axis (<Figure 2.2.27>), or the left and right of Y-axis (<Figure 2.2.28>).
<Figure 2.2.27> Two-sided vertical bar graph of the gender by class.
<Figure 2.2.28> Bi-lateral horizontal bar graph of the gender by class.
By clicking on the pie chart icon
,
a pie chart is drawn as in <Figure 2.2.29> which has two pie charts for classes of ‘5-1’ and ‘5-2’.
Each pie chart shows the frequencies of the number of male and female students by dividing a pie (circle) into pieces with two colors depending on angles which are proportional to the frequencies of each category.
By clicking on the band graph icon
,
a band graph is drawn as in <Figure 2.2.30> which has two band graphs for classes of ‘5-1’ and ‘5-2’. Each band graph shows the frequencies of the number of male and female students by dividing a rectangle into squares with two colors which are proportional to the frequencies of each category.
<Figure 2.2.29> Pie charts for gender distribution in two classes.
<Figure 2.2.30> Band graphs for gender distribution in two classes.
Example 2.2.4(Male and Female Population by Age Groups)
In 2015, the male and female populations by age groups in Korea are shown in Table 2.2.3. Using this data, draw a vertical bar graph by age groups and then find appropriate graphs to analyze the characteristics of this data easily.
Table 2.2.3 male and female populations by age groups in Korea
(KOSTAT Census 2015, unit 10,000 persons)
Age Interval
2015 Male
2015 Female
00 - 04
115
109
05 - 09
116
109
10 - 14
126
116
15 - 19
166
151
20 - 24
181
158
25 - 29
158
145
30 - 34
158
176
35 - 39
193
186
40 - 44
214
207
45 - 49
215
212
50 - 54
209
205
55 - 59
192
194
60 - 64
134
141
65 - 69
102
110
70 - 74
79
97
75 - 79
55
80
80 - 84
28
54
over 85
13
39
Answer
The data of Table 2.2.3 can be loaded from 『eStat』 using the following address.
Click on the variable name of the first variable, 'AgeInterval' followed by the second variable
‘2015_Male’ and the third variable ‘2015_Female'. As shown in <Figure 2.2.31>,
you may select the ’AgeInterval’ variable from the 'Analysis Var' box and ‘2015_Male' and
’2015_Female’ variables sequentially from the 'By Group box. When these variables are selected,
a separated vertical bar graph
as shown in <Figure 2.2.32> which separates the male and
female populations with the same scale of Y-axis will be appeared in the Graph Area.
<Figure 2.2.31> Variable selection for analysis
<Figure 2.2.32> Separated vertical bar graph of population by age group and by gender
Among ten possible bar graphs, a side-by-side bar graph
as <Figure 2.2.33> would be useful, because it shows the comparison of the number of male
and female populations in each age interval. A ratio bar graph
as <Figure 2.2.34> which shows directly the proportions of male and female populations in each age interval
can also be useful. In each of the graphs, you can easily see that the female population is getting larger than
the male population after the age interval of 50s and more.
<Figure 2.2.33> Side-by-side vertical bar graph of population by age and by gender
<Figure 2.2.34> Proportional horizontal bar graph of population by age and by gender
A line graph
as in <Figure 2.2.35> can also be used to see this kind of patterns.
<Figure 2.2.35> Line graph of population by age and by gender
An overall distribution of the male and female populations by age group can be observed by using
a two-sided (bi-lateral) horizontal bar graph
as in <Figure 2.2.36> which is usually called a population pyramid. Currently, Korea has an age-specific population structure which looks like a jar. In other words, the population in age intervals of 40 to 50 is higher than the population in age intervals of 30 or less which is gradually decreasing. It would cause many problems in the future society such as the population decrease, the medicare budget increase etc.
<Figure 2.2.36> Bi-lateral horizontal bar graph of population by age and by gender
Practice 2.2.3(Death rates in Virginia)
For each of five age groups (50–54, 55–59, 60–64, 65–69, 70–74), death rates are measured per 1000 population per year
in Virginia. They are cross-classified by population group such as Rural/Male, Rural/Female, Urban/Male and
Urban/Female. This data are saved at the following location of 『eStat』system.
[Ex] ⇨ eBook ⇨ PR020203_Rdatasets_VADeaths.csv
Draw appropriate graphs to analyze characteristics of the data.
In general, if there are many groups (columns) on the summary data, you can compare the difference between groups for each category of the analysis variable using different kinds of graphs. If there are many groups, it is recommended that you draw several kinds of graphs, because each graph can show you different characteristics of data.
If data are observed over time, it is called a time series and a line graph is usually used to observe a trend over time. The X-axis includes values of a time variable which are spaced equally and Y-axis represents a scale of all time series data. Each pair of data, time and value is marked as a point in a two-dimensional coordinate plane and two adjacent points are connected with a line.
Example 2.2.5(OECD Export – Import by Country)
In 2017, import and export data of OECD countries are stored at the following location of 『eStat』 system.
Section 2.2 describes how to visualize the summary data of a categorical variable. This section describes how to visualize qualitative data in the form of raw data. In general, the raw data are processed by counting the number of data in each category of a categorical variable using a statistical package and the resulted summary data are visualized as in Section 2.2. 『eStat』 system enables this kind of work easily.
This section describes how to visualize the raw data in the case of qualitative data without groups and with groups.
2.3.1 Raw Data of Categorical Variable
Example 2.3.1(Survey on Gender – Raw Data)
There are 10 students in a class and gender (male or female) data are collected as follows. This kind of data are called as raw data.
In order to use a statistical package, the raw data are usually arranged as a column as in Table 2.3.1 with numeric coding (1: male, 2: female). Coding is not compulsory depending on packages, but 『eStat』 system allows the raw data of both with coding and without coding (character data).
Table 2.3.1 Gender Survey (1:Male, 2:Female)
Gender
1
2
1
2
1
1
1
2
1
2
Data of Table 2.3.1 are saved at the following address in 『eStat』 system.
[Ex] ⇨ eBook ⇨ EX020301_Raw_Gender.csv
Use 『eStat』 to draw a bar graph, a pie chart, and a band graph to find out characteristics of the raw data.
Answer
Enter the gender data of Table 2.3.1 on the sheet of 『eStat』 system or load the file from [Ex] ⇨ eBook ⇨ EX020301_Raw_Gender.csv.
If you enter the data, give a variable name 'Gender' using [Edit Var] button as shown in <Figure 2.3.1> and provide value labels to values 1 and 2 (1 for Male and 2 for Female).
If the data are edited using the value labels, it must be saved with JSON format file (click on the JSON Save icon) to ensure that the edited information is not lost.
If you want to load a file in JSON format to 『eStat』 system, you must also use the JSON Open icon.
<Figure 2.3.1> Value label using [Edit Var]
Click on the variable name 'Gender' to draw a vertical bar graph as shown in <Figure 2.3.2>.
A bar graph was drawn after counting frequencies of each gender category, i.e. six males and four females. A basic function of statistical packages such as 『eStat』 is to examine a frequency distribution of the raw data and draw a graph using this frequency distribution.
The vertical bar graph
such as <Figure 2.3.2> is frequently used. If there are a large number of categories,
a horizontal bar graph
as in <Figure 2.3.3> can also be used (click on the horizontal bar graph icon in 『eStat』 ).
<Figure 2.3.2> Vertical bar graph of gender
<Figure 2.3.3> Horizontal bar graph of gender
Click on the pie chart icon
in 『eStat』 to draw a pie chart (<Figure 2.3.4>)
and click on the band graph icon
to draw a band graph (<Figure 2.3.5>). Click on the icon to draw a doughnut graph.
<Figure 2.3.4> Pie chart of gender
<Figure 2.3.5> Band graph of gender
Practice 2.3.1(Preference of Mathematics)
In an elementary school, gender (1: male, 2: female) and math preference (1: good, 2: ordinary, 3: no) of students were surveyed and saved at the following location of 『eStat』 system.
Draw a bar graph, a pie chart, a band graph of the math preference.
2.3.2 Raw Data of Categorical Variable with Group
Example 2.3.2(Survey on Gender and Marital Status – Raw Data with Group)
In addition to the gender data of Example 2.3.1, marital status (1: Single 2: Married, 3: Other) was also surveyed as in Table 2.3.2. Compare characteristics of marital status by gender (male and female) using a bar graph, a pie chart, a band graph, and a line graph. Data are saved at the following location of 『eStat』.
[Ex] ⇨ eBook ⇨ EX020302_Raw_MaritalByGender.csv
Table 2.3.2 Survey of gender and Marital status
Gender (1:Male, 2:Female)
Marital Status (1:Single, 2:Married, 3:Other)
1
1
2
2
1
1
2
1
1
2
1
1
1
1
2
2
1
3
2
1
Answer
Enter the gender and marital status data of Table 2.3.2 on the sheet of 『eStat』 system. Use [Edit Var] button to enter a variable name as 'Gender' and its value labels as 'Male' for value 1 and 'Female' for value 2. Similarly, enter a variable name as ‘Marital’ and its value labels as ‘Single’ for 1, ‘Married’ for 2 and ‘Other’ for 3.
The data edited for their variable names and value labels must be saved in JSON format file (click on the JSON Save icon ) to ensure that the information is not lost. When recalling again, you must also click on the JSON Open icon .
If you click on the ‘marital’ and ‘gender’ variable names sequentially, a separated vertical bar graph
of marital status by gender is drawn as in <Figure 2.3.6>.
<Figure 2.3.6> Vertical bar graph of marital status by gender
This separated bar graph by gender can be modified in a different form using the icons below the main icons
which are a vertical stacked
(<Figure 2.3.7>), a ratio (<Figure 2.3.8>),
a side-by-side (<Figure 2.3.9>)
,
a bi-lateral (Figure 2.3.10)
and a horizontal separated
(<Figure 2.3.11>), a stacked
(<Figure 2.3.12>), a ratio
(<Figure 2.3.13>), a side-by-side
(<Figure 2.3.14>), a bi-lateral
(<Figure 2.3.15>).
<Figure 2.3.7> Vertical stacked bar graph of marital status by gender
<Figure 2.3.8> Vertical proportional bar graph of marital status by gender
<Figure 2.3.9> Vertical side by side bar graph of marital status by gender
<Figure 2.3.10> Vertical bi-lateral bar graph of marital status by gender
<Figure 2.3.11> Horizontal bar graph of marital status by gender
<Figure 2.3.12> Horizontal stacked bar graph of marital status by gender
<Figure 2.3.13> Horizontal proportional bar graph of marital status by gender
<Figure 2.3.14> Horizontal side by side bar graph of marital status by gender
<Figure 2.3.15> Horizontal bi-lateral bar graph of marital status by gender
Click on the pie chart
and band graph icons
and in 『eStat』 to draw a pie chart as in <Figure 2.3.16> and a band graph as in <Figure 2.3.17>. It is easy to observe which categories in each group account for a large percentage of the total data.
<Figure 2.3.16> Pie chart of marital status by gender
<Figure 2.3.17> Band graph of marital status by gender
Practice 2.3.2(Preference of Mathematics by Gender)
n an elementary school, gender (1: male, 2: female) and math preference (1: good, 2: ordinary, 3: no) of students were surveyed and saved at the following location of 『eStat』 system.
A word cloud is a visual representation of text data. It is a visualization of word frequency in a given text as a weighted list. The importance of each word is shown with font size or color in the word cloud. Bigger term means greater weight.
The word cloud is useful for quickly perceiving the most prominent words to determine its relative prominence. It has been recently used to visualize the topical content of political speeches. It can be used as website navigation aids to determine hyperlink to items associated with the word in social software. The keyword in the cloud is sometimes used as a search engine marketing term that refers to a group of keywords that are relevant to a specific website.
There are many algorithms to generate a word cloud. eStat adopted the algorithm of d3 open software. There are some evolving approaches to construct a word cloud by applying word co-occurrences in documents.
Example 2.4.1(Word Cloud) Generate a word cloud on the following description on data science and analyze important words.
The development of these technologies has created massive data, simply called ‘Big Data’, that were unimaginable in the past. Typical examples of the big data include data from Google's search records, social media messages by mobile phones, web logs by internet connections, and telephone records of global telecom companies. The big data are expected to grow and increase exponentially in the future and the hyper-forecasting is also expected to be possible. The success or failure of each individual, group, company and even country would depend on how to utilize the big data efficiently.
The analysis of the big data that emerged this century is so enormous and diverse in the amount of data that can not be fully utilized just by traditional statistical approaches. For the analysis and utilization of the big data, theories of statistics, computer science, mathematics, management or related disciplines must also be applied simultaneously. Data Science is a new area of study in which statistics, mathematics, computer science and other disciplines are fused to analyze and utilize the big data that emerged this century.
Answer
Click eStaU icon and click [Word Cloud] in the menu or you can use the QR code below.
Copy the sample text to [Data Input] text area and click [Execute] button. Then, a word cloud on data science is drawn as in
<Figure 2.4.1>.
‘Data’ is the highest frequency word in the word cloud,‘big’ is the next in this word cloud.
[Word Cloud]
Practice 2.4.1(Inaugural Address of US President: Jonh F. Kennedy))
Generate a word cloud on the following part of inaugural address by US president John F. Kennedy and analyze important words.
In the long history of the world, only a few generations have been granted the role of defending freedom in its hour of maximum danger. I do not shrink from this responsibility--I welcome it. I do not believe that any of us would exchange places with any other people or any other generation. The energy, the faith, the devotion which we bring to this endeavor will light our country and all who serve it--and the glow from that fire can truly light the world.
And so, my fellow Americans: ask not what your country can do for you--ask what you can do for your country.
My fellow citizens of the world: ask not what America will do for you, but what together we can do for the freedom of man.