Chapter 9

Chapter 9. Artificial intelligence and other applications

[presentation] [book]

9.1 Artificial intelligence, machine learning, and deep learning
9.2 Text mining
9.3 Web mining
9.4 Multimedia mining
9.5 Spatial data analysis

CHAPTER OBJECTIVES

We introduce the following in this chapter.

• Applications of data science into artificial intelligence in section 9.1.

• Text mining in section 9.2.

• Web data mining in section 9.3.

• Multimedia mining in section 9.4.

• Spatial data analysis in section 9.5.

9.1 Artificial intelligence, machine learning, and deep learning

Artificial intelligence (AI) refers to machines that have the intelligence to imitate human intelligence and perform complex tasks like humans. An important method for implementing the artificial intelligence is the machine learning models. In this book, we have discussed several models of the machine learning such as the supervised learning in Chapters 6 and 7, the unsupervised learning in Chapter 8, but the artificial neural network model is the most important model for the artificial intelligence. One method for finding the solution to the neural network model is a deep learning. Therefore, the artificial intelligence, machine learning, artificial neural network, and deep learning can be expressed as a set relationship as in <Figure 9.1.1>.

<Figure 9.1.1> Relationship of the artificial intelligence, machine learning, artificial neural network, and deep learning

A. Artificial Intelligence

In 1943, Warren McCullough and Walker Pitts, who were neuroscientists in the United States, first proposed the idea of an artificial intelligence model by proposing the operating principles of human neurons based on binary code. Nowadays, the definition of artificial intelligence varies from person to person. According to the book widely used as a textbook of the artificial intelligence, 'Artificial Intelligence – A Modern Approach' by Stuart Russel and Peter Norvig, it is defined as a machine that "thinks humanly, acts humanly, thinks rationally, and acts rationally."

In the early days of artificial intelligence, efforts were made to create machines similar to humans by imitating the human brain and way of thinking. In 1955, Marvin Minsky and others at Dartmouth College in the United States built the first neural network, the SNARC system. Around the same time, computer scientist Viktor Glushkov in the Soviet Union created the All-Union Automatic Information Processing System (OGAS). With the development of computer science, rather than implementing human intelligence itself, AI machines were made to efficiently solve real-world problems that can be broadly divided into four types of stages.

Level 1: Simple control machine – wash machine, cleaning machine

Level 2: Classical artificial intelligence with diverse patterns
– numerous and sophisticated ways to establish input-output relationships, utilizing a knowledge base

Level 3: Artificial intelligence that accepts the machine learning models
– uses machine learning algorithms that learn based on data

Level 4: When performing machine learning, the machine directly learns the features of the input values without a person inputting them

At the early stage of AI, a single layer neural network, perceptron, was used. However, due to the limitations of information processing capabilities, the applications of the perceptron were limited, and its popularity waned for a while. In 1974, Paul Warboss proposed a backpropagation algorithm that could solve multilayer neural networks, and research on AI using multilayer neural network models was actively conducted, resulting in visible results such as character recognition and speech recognition. However, until then, it was closer to an answering machine than a conversation with a human. In 2006, Geoffrey Hinton announced the deep confidence neural network that is capable of unsupervised learning methods which were considered impossible, and as a result, the methodology called deep learning replaced the higher-level concept of the artificial neural network and became the only methodology. In particular, in 2012, Hinton's disciples Alex Krizhevsky and Ilya Sutskever built AlexNet, a convolutional neural network architecture, and won the computer vision competition called "ILSVRC" with overwhelming performance, and deep learning became an overwhelming trend, surpassing the existing methodology. In 2016, Google DeepMind's AlphaGo popularized deep learning methods and showed results that surpassed human levels in several fields. AI research has continued to conduct innovative research to solve problems that only humans could do, such as natural language processing and complex mathematical problems, using computers. Awareness spread that AI could quickly surpass human capabilities in the field of weak artificial intelligence.

Artificial intelligence began to change significantly in 2022 with the emergence of generative AI. OpenAI's ChatGPT and Drawing AI, which are representative of the generative AI, have finally begun to be applied to actual personal hobbies and work applications, and the practical application of AI, which had seemed like a dream, has finally begun. The background of this generative AI was the transformer structure. However, generative AI has led to active discussions about AI, and among them, theories of caution and threats to AI have also begun to emerge.

Technologies of AI

Most of the current artificial intelligence is made up of artificial neural network models where numerous nodes have their intelligence which performs their own calculations. The artificial intelligence performs a large number of simulations on a neural network with many hidden layers with various weights and learning methods until it produces a successful result. When a question (signal) is given to the artificial intelligence, each node responds to the question and transmits a signal to the next node. So each node that receives the signal filters the signal according to its own given criteria, the bias, and re-produces it. The sum of the filtered signals becomes the 'answer' that artificial intelligence delivers to us.

For example, if we give a picture of a partially obscured puppy and ask the artificial intelligence whether the object in the image is a puppy, the artificial intelligence filters the signal for the picture according to its own criteria, which is the characteristics of each puppy remembered by each node. If it is a picture of a puppy with its eyes obscured, the node that learned the puppy's eyes determines that the picture is an incorrect image (signal) because there are no puppy eyes, and outputs 0, meaning false or incorrect. However, nodes that have learned other parts of the dog, such as the nose, mouth, ears, and legs, judge that the photo is the correct image and output 1, meaning true or correct. At this time, since the values of 1 are produced more than 0 by parts other than the eyes, the AI ultimately calculates the answer for the photo as 'puppy'. This characteristic of AI contributes to automatically ignoring errors and producing the correct answer, just like a human, even if there are typos or incorrect words in the question. That is why some experts call AI's answers probabilistic answers.

In modern times, research on probability and random algorithms is the most popular. In general, problems that can be determined as "if A, then B." can be approached relatively easily by computer programs. However, if there are cases where multiple answers can exist, such as 'art' can be 'art' or 'technology', the surrounding circumstances such as 'context' must be considered. But it is difficult to give a clear answer such as "if these words appear before and after, then it is 'art', otherwise it is 'technology'." This type of problem is solved using complex mathematics that deals with statistics and probability. In fact, modern artificial intelligence research is said to be proceeding by assigning categories to each word and interpreting the meaning of the sentence as a whole with more categories. As an extremely simple example, when 'Music is an art' is said, it guesses the category 'art' that includes the two meaningful words in the sentence, music and art, and interprets it according to the context. AlphaGo also belongs to this method.

In artificial intelligence, if a given problem can be solved, all techniques and technologies are used without distinction. If the quality of the results is excellent, technologies with no theoretical background are applied and recognized. Below is a list of only some of the famous technologies and techniques.

Maze exploration algorithm: This is the most basic artificial intelligence algorithm that allows robots (micro mice) or self-driving cars to recognize adjacent terrain features and find their way to a specific destination. It is an algorithm that can operate without machine learning. Naturally, more complex machine learning algorithms are used in commercial products.

Fuzzy theory: A concept that expresses ambiguous states in nature into quantitative values, such as ambiguity in natural language, or, conversely, to change quantitative values into ambiguous values in nature. For example, when a human feels "cool," it determines the temperature and uses it.

Pattern recognition: This refers to finding specific patterns in various linear and nonlinear data such as images, audio sources, and text. In other words, to put it simply, it is a technology that allows computers to judge data similarly to humans and distinguish what kind of data it is.

Machine learning: As the name suggests, it is a field that studies giving computers artificial intelligence that can learn.

Artificial neural network: One of the learning algorithms being studied in the field of machine learning. It is a technology mainly used for pattern recognition, and it reproduces the connections between neurons and synapses of the human brain as a program. To put it simply, it can be seen as 'simulating' 'virtual neurons' (it is not exactly the same as the operating structure of actual neurons). Generally, appropriate functions are given by creating a neural network structure and then 'learning' it. Since the human brain has the best performance among the intelligent systems discovered so far, artificial neural networks that imitate the brain can be seen as a discipline that has developed with a fairly ultimate goal. For more information, refer to the machine learning document. In the 2020s, the computing power of computers has developed at a frightening rate, and the amount and type of data being poured in has also increased accordingly, so artificial neural network technology, which has an excellent ability to process unstructured data [13], is receiving the most attention among all artificial intelligence technologies and is being treated as a technology that will receive even more attention in the future. By now, those who are interested in artificial intelligence will have noticed that a deep neural network is an artificial neural network that is made by connecting numerous artificial neural networks and stacking them in layers, and this is what we commonly know as deep learning.

Genetic algorithm: Finding the appropriate answer to a specific problem by repeating generations through the process of natural evolution, that is, the process of crossover and mutation of populations that make up a certain generation. While most algorithms express the problem as a formula and find the maximum/minimum through differentiation, the purpose of a genetic algorithm is to find the most appropriate answer, not the exact answer, to a problem that is difficult to differentiate.

Energy and computer architecture issue

A program called the deep learning of an artificial neural network is a type of simulation. Running a computer simulation of a large number of cases for a long time consumes a lot of energy for computer operation. In fact, the cost of electricity was not a big issue until just before 2020, when artificial neural networks began to develop in earnest. However, as various weights and learning methods were introduced to artificial neural networks and parameters were increased, the demand for parallel processing devices, which are much more expensive than the central processing units (CPU), and the corresponding electricity consumption increased vertically to improve performance, which became an issue.

The fundamental problem is that the current von Neumann structure computer components were not initially designed for artificial neural networks. Current computers are separated from the CPU, which is a serial processing device, to the data storage and processing lines and auxiliary data processing devices. In particular, this auxiliary data processing device refers to the graphic processing units (GPU), which perform parallel processing. Since artificial neural networks are designed based on matrix mathematics, simultaneous calculations must be performed in each node (computational unit/coded neuron) that makes up the artificial neural network, and to do so, the parallel processing device must be the main device. This problem of inefficiency is directly linked to the problem of heat generation, so there is a limit to simply increasing the number of transistors through process miniaturization without changing the architecture to improve performance.

For this reason, many semiconductor companies are currently jumping into the development of AI chips to reduce power consumption and increase computational efficiency. The industry predicts that the landscape of AI development will change again when AI chips completely replace GPUs in the future. However, even if an AI chip is developed, it will take a considerable amount of time to establish a software ecosystem that can utilize the chip.

B. Machine Learning

Teaching artificial intelligence to have good intelligence is called machine learning. Machine learning is a type of program that gives artificial intelligence the ability to learn without being explicitly programmed. A simple definition of a computer program is "If input data is A and condition B is satisfied, the answer is C." However, machine learning is "If input data is A, train a machine to learn condition B given correct answer C, then make it conclude C". The comparison between a computer program and machine learning is as follows.

Table 9.1.1 Comparison between a computer program and machine learning
	Computer program	Machine learning
Input	A = 3, B = 2	coefficients 3, 2, 1, 8, 3, 5
Program	C = A * B	3 * X + 2 * Y = 1 8 * X + 3 * Y = 5
Output	C = 3 * 2 = 6	X = 1, Y = -1

There are various machine learning methods for artificial intelligence, and models such as supervised learning (decision tree, Bayes classification, kNN, support vector machine, etc.) and unsupervised learning (k-means clustering, etc.) studied in chapters 6 to 8 of this book are widely used. And recently, the reinforcement learning method, which gives rewards instead of outcome values and makes action selection, is also introduced using the Markovian Decision Process (MDP) theory. In reinforcement learning, a reward is given every time an action is taken in the current state, and learning proceeds in the direction of maximizing this reward. The video below shows how to teach a robotic arm to play table tennis using reinforcement learning.

$\qquad \qquad$
$\qquad \qquad \quad$https://www.youtube.com/watch?v=SH3bADiB7uQ

<Figure 9.1.2> Robotic arm to play table tennis using reinforcement learning

C. Deep learning

The multilayer neural network model learned in Chapter 7 is a model that can handle various data among various machine learning models, but it has limitations in application because it cannot find an accurate solution when solving a complex nonlinear optimization problem using the back-propagation algorithm. In cases where there are multiple hidden layers, the back-propagation algorithm has problems such as vanishing gradients where data disappears and learning does not proceed well, and it also has limitations in processing new data, although it processes learned content well.

However, in 2006, Professor Geoffrey Hinton of the University of Toronto, Canada, solved the vanishing gradient problem through pretraining of neural networks and dropout data, and began calling the neural network that applied this algorithm as deep learning. The recent popularity of deep learning is due to the development of an algorithm that overcomes the limitations of the existing back-propagation, the increase in learning materials due to the accumulation of a large amount of data, and the development of graphic processing unit (GPU) hardware suitable for learning and calculation using neural networks. Deep learning has been popular since 2010, and there are several important events that led to this popularity.

1) The error rate of Microsoft's voice recognition program was over 10% until 2010, but after the introduction of the deep learning algorithm, the error rate was drastically reduced to 4%.

2) IMAGENET, a competition in the field of image recognition, is a competition to recognize objects in photos by looking at them, and the error rate was close to 26% until 2011. However, in 2012, the SuperVision team of the University of Toronto in Canada recorded an incredible error rate of 16.4% with AlexNet, which applied a deep convolutional neural network. After that, IMAGENET participants reduced the error rate to 3.6% using the deep learning method. This is a figure that surpasses the error rate of humans.

3) In 2012, Google successfully recognized cats and humans by learning 10 million screens (200×200) of captured YouTube videos. This was an important event in that it trained machines using an unsupervised learning method, not the existing supervised learning method. This project was led by Professor Andrew Ng and was the result of training a neural network with 16,000 CPU cores and 9 layers and 1 billion parameters for 3 days.

<Figure 9.1.3> Pattern recognition of humans and cats using unsupervised learning by Google

These three major events confirmed the performance of deep learning and became an opportunity to be actively used in various fields of society.

9.2 Text mining

With the development of computer technology, a large amount of useful information is being accumulated, and a considerable amount of this is text data. For example, useful information collected from various sources such as books or research papers in digital libraries, news from media outlets, email messages, and web pages is being rapidly stored in text databases. A method of searching for useful information in such text databases in a more efficient way is called text mining. Unlike existing databases, text databases have characteristics in that the form and size of documents are unstructured, and therefore require analysis methods different from existing methods.

A traditional method of searching for useful information in text databases is the keyword information retrieval method used in libraries. Libraries classify books or reference materials by subject or author and create a database system. When users query using the keywords of the information they are looking for, the computer system searches for documents or books containing these words. Recently, keyword search methods are also used when searching web documents on portal sites such as Google. However, as the amount of digital documents increases, keyword search methods can cause users to be confused about which documents are useful, as the number of web documents searched for a given query reaches thousands. Therefore, it became necessary to create measures of the importance or relevance of the documents being searched and to prioritize the information being searched. This type of research is an example of text mining. Let's discuss various issues for text data mining.

A. Preprocessing of data for text mining

In order to perform text mining, preprocessing of data is required. First, stop words that are not directly needed for text mining are removed from each document's data. For example, articles, prepositions, and verb endings changed by tense are removed, and prefixes and suffixes are removed from nouns and only the stem is extracted. A document is expressed as a set of terms consisting of such stems and nouns.

B. Measure of the appropriateness of information retrieval

Let $R$ be the set of relevant documents in a text database for a given query, and $n(R)$ be the number of relevant documents. Let $T$ be the set of documents retrieved by a system, $n(T)$ be the number of retrieved documents, and $n(R ∩ T)$ be the number of relevant documents in the retrieved documents. To evaluate the documents retrieved for a given query by an information retrieval system, there are precision and recall as measures as follows. $$ \begin{multline} \shoveleft \qquad \text{precision} = \frac{n(R ∩ T)}{n(T)} \\ \shoveleft \qquad \text{recall} = \frac{n(R ∩ T)}{n(R)} \\ \end{multline} $$ Precision is the proportion of relevant documents in the retrieved document set, and recall is the proportion of documents retrieved among the relevant documents. The higher both measures are, the better the retrieval system is.

C. Information retrieval method

Information retrieval methods can be broadly divided into keyword-based and similarity-based search. Keyword-based search displays each document as a set of keywords, and when a user queries with multiple keywords, the system finds appropriate documents using the queried keywords. The difficult problem with this method is how to find documents that are synonyms or have multiple meanings of the keywords.

Similarity-based search uses the frequency and proximity of keywords to find similar documents. The following cosine distance measure is often used to measure the similarity of documents. Let's display a document as a set of terms excluding stop words. If the terms appearing in the entire document set are $x_{1}, x_{2}, ... x_{m}$, then each document can be represented as a binary data vector indicating whether the terms appear (‘1’) or not (‘0’), for example, $\boldsymbol x_{1} = (x_{1}=1, x_{2}=0, ... x_{m}=1)$ by the occurrence of each term. The cosine distance between two document vectors $\boldsymbol x_{1}$ and $\boldsymbol x_{2}$ is as follows. $$ d(\boldsymbol x_{1}, \boldsymbol x_{2}) \;=\; \frac{\boldsymbol x_{1} \cdot \boldsymbol x_{2}}{||\boldsymbol x_{1}|| ||\boldsymbol x_{2}||} $$ Here, $\boldsymbol x_{1} \cdot \boldsymbol x_{2}$ is the inner product of two vectors, which means the number of terms that both documents contain, and $||\boldsymbol x_{1}||$ which means the number of terms in document $\boldsymbol x_{1}$. In other words, the cosine distance means the ratio of both terms included to the average number of terms in each document.

A document can be represented as a binary vector indicating the occurrence of a term, or as a term frequency vector indicating how many times a word appears. Or, it can be represented as a relative frequency of the occurrence frequency of all terms to analyze similarity. For more information, please refer to the references.

Association analysis and classification analysis

Using the binomial vector data of documents for keyword-based retrieval, the relevance of terms can be studied through association analysis. Terms that frequently occur in each document are likely to form phrases that are complexly related to each other, and association analysis can be used to find association rules of interesting terms.

Recently, as the amount of newly created documents increases, the problem of classifying these documents into appropriate document sets has been gaining attention. To this end, a set of pre-classified documents is used as training data, and new documents are classified using the classification analysis.

9.3 Web data mining

The development of network technology has led to the construction of knowledge and information held by individuals, companies, and organizations in the form of web pages, allowing them to share such information in real time. The web, which provides useful information and convenience in daily life, also presents many tasks that must be solved.

• The number of individuals and organizations providing information through the web is constantly increasing, and the information they provide is frequently updated, so the problem of efficiently managing this.

• The web can be thought of as a huge digital library, but each web has a non-standardized structure and various creative forms. In this huge and diverse digital library, the information that users want is a very small portion of less than 0.1%, so the problem of efficiently searching for the information they want.

• Web users are diverse people with different backgrounds, interests, and purposes. However, most users do not have detailed network or search skills, so they can get lost on the web, so the problem of easily guiding them is the problem.

All methods studied to solve such problems are called web data mining. Recently, portal sites such as Google have appeared that organize web pages well on behalf of users and search for necessary web pages, competing with users with their own search engines. The search method used is mainly a keyword-based search method used in text mining. As the web becomes larger, there is a problem that when a common keyword is given, thousands of web contents with low relevance are searched, or web contents with synonyms for the keyword are not searched. Web content mining, which searches for websites with good information that users want in web databases that have a more complex structure than text databases, is an important research field. In addition, there are various research fields such as web document classification and web log mining.

A. Web content mining

Studying whether web pages retrieved from a web database are useful can be viewed as a similar concept to text mining. However, the web contains not only pages but also additional information, including hyperlinks that connect other pages within a page. Although text documents may have references, web links contain more reliable information. In particular, web pages called hubs, which collect links to reliable web pages on a single topic, do not stand out on their own but provide useful information on common topics.

Many studies have been developed using hyperlinks or hub pages to search for reliable web content, and a useful search method called HITS (Hyperlink Induced Topic Search) is introduced.

1) Search web pages for a given query to form a root set, and form a base set including all pages linked to the web pages in the root set. The pages in the base set are denoted as {1,2, ...,$n$}, and the pages linked to each page are denoted as a $n$ × $n$ matrix $A = \{a_{ij}\}$. The element $a_{ij}$ of the matrix is 1 if page $i$ and page $j$ are linked, and 0 if not.

2) Each page in the base set is given an initial weight $\boldsymbol w = (w_{1}, w_{2}, ... w_{n}=1)$ and a hub weight $\boldsymbol h = (w_{1}, w_{2}, ... w_{n}=1)$. At this time, the sum of the squares of the weights is set to 1, and the relationship between the two weights is defined as follows. $$ \begin{multline} \shoveleft \qquad \boldsymbol h = A \cdot \boldsymbol w \\ \shoveleft \qquad \boldsymbol w = A' \boldsymbol h \\ \end{multline} $$ The first equation means that if a page points to multiple trusted pages, its hub weight should increase. The next equation means that if a page is recommended by multiple hub pages, its weight should increase.

3) The above two equations are applied repeatedly $l$ times, which can be mathematically expressed as follows. $$ \begin{multline} \shoveleft \qquad \boldsymbol h = A \cdot \boldsymbol w = (AA') \cdot \boldsymbol h = (AA')^2 \cdot \boldsymbol h = \cdots = (AA')^l \cdot \boldsymbol h \\ \shoveleft \qquad \boldsymbol w = A' \cdot \boldsymbol h = (A'A) \cdot \boldsymbol w = (A'A)^2 \cdot \boldsymbol w = \cdots = (A'A)^l \cdot \boldsymbol w\\ \end{multline} $$ That is, each weight is the eigenvector of $(AA')^l$ and $(A'A)^l$.

4) It searches pages with large weights and hub weights of each page first.

The HITS search method has the disadvantage of relying only on links, but it is useful for web page search. It is used in Google's search engine and is reported to obtain better search results than other search engines.

B. Web page classification

We have seen that classification analysis of documents is a major research topic in text mining. There are many studies on the problem of classifying new pages using training pages consisting of pages already classified into specific topics on web pages. For example, portal sites such as Google and Yahoo classify web pages into frequently used topics such as ‘movies’, ‘stocks’, ‘books’, and ‘cooking’ for the convenience of users, and when a new web document appears, it is classified into one of the existing classification topics. The term-based classification method used in text mining also shows good results in the classification of web documents. In addition, the concept of hyperlinks included in web documents can be usefully used in the classification.

C. Web log mining

When a web page server for e-commerce accesses a web page and searches for information, it stores basic information such as each user's IP address, requested URL, and requested time. This is called web log data. The amount of web log data recorded can easily reach hundreds of terabytes ($10^{12}$ bytes), and analyzing this data is called web log mining. By studying user access patterns for web pages, it is possible to design web pages efficiently, identify potential customers for e-commerce, develop marketing strategies that consider customers' purchasing tendencies, and deliver customized information to users.

In order to analyze web log data, preprocessing processes such as refining, simplifying, and converting the data studied in Chapter 3 are necessary. Then, a multidimensional analysis of IP addresses, URLs, time, and web page content information is performed to find potential customers. By studying association rules, sequential patterns, and web access trends, we can understand users' reactions and motivations, and use the results of these analyses to improve the system, such as page design. We can help build personalized web services for users and attempt to rank web pages.

9.4 Multimedia data mining

The popularization of audio and video equipment, CD-ROMs, and the Internet has led to the emergence of multimedia databases that contain voice, images, and video in addition to text. Examples include NASA's Earth Observation System, the Human Chromosome Database, and the audio-video databases of broadcasting stations. Multimedia data mining is the exploration of interesting patterns within these multimedia databases. It has some similarities to text data mining and web data mining, but it can be seen as involving a unique analysis of image data. Here, we will learn about similarity exploration, classification, and association mining of image data.

A. Similarity Search for Image Data

Similarity search for image data can be divided into description-based search and content-based search. Description-based searches for similar images based on image descriptions such as keywords, size, and creation time, and is similar to text mining. However, search based only on image descriptions is generally vague and arbitrary, and the results are not good.

Content-based searches for images based on visual characteristics such as color, texture, and shape of the image, and is widely applied in medical diagnosis, weather forecasting, and e-commerce. Content-based search can be divided into a method of finding similar image data in a database using image samples and a method of finding data with similar characteristics in a database by describing the characteristics of the image in detail, and the latter is more commonly used. The following methods are available for representing the characteristics of images:

1) Color histogram-based features - This is a method of drawing a histogram for all the colors in an image, which allows you to understand the overall flow, but it may be a meaningless feature because it does not include information about the shape, location, or texture of the image.

2) Multifeature-composed features - These include a combination of the color histogram, shape, location, and texture of the image. A distance function is also introduced for each feature.

3) Wavelet-based features - It expresses shape, texture, and location information using the wavelet transform of the image. However, this method may fail to search for local images because it uses a single wavelet transform for the entire image.

So, it also expresses one image using multiple local wavelet transforms.

B. Classification of image data

Classification of image data is widely used in scientific research such as astronomy and geology. For example, in astronomy, models are built to recognize planets using properties such as size, density, and direction of movement of stars. Since the size of image data is usually large, data transformation, decision tree models, or complex statistical classification models are introduced for classification.

C. Association analysis of image data

Since image data has many features such as color, shape, texture, text, and spatial location, various association analyses are possible, which can be broadly divided into the following three types.

1) Association analysis of words that can characterize the content of an image
For example, an association rule such as ‘If more than 30% of an image is blue → sky’ can be given.

2) Association analysis of other images related to the content of a space in an image
For example, an association rule such as ‘If there is green grain in a field → there is likely to be a forest nearby’ can be given.

3) Association analysis of other images with the content of an image that is not related to spatial relationships
For example, an association rule such as ‘If there are two circles → there is likely to be a square’ can be given.

In image association analysis, resolution can also be important. That is, there are cases where features appear to be the same up to a certain resolution but differ at a finer resolution. In this case, association analysis is first performed at a low resolution, and then association analysis is performed while gradually increasing the resolution for patterns that satisfy the minimum support criterion.

9.5 Spatial data analysis

Spatial data analysis is the process of finding interesting patterns in a database containing spatial information and can be used to understand spatial data, discover spatial relationships, and discover relationships between spatial or non-spatial data. This type of spatial data analysis is widely used in medicine, transportation, the environment, management of multinational companies, and geographic information systems.

Spatial data often exists in various data formats and in different regions as databases. In such cases, it is recommended to create a spatial data cube after organizing a data warehouse by subject, time, or other classifications for analysis. Multidimensional analysis can be used to facilitate spatial data analysis.

Spatial data analysis can be used to describe, associate, classify, cluster, analyze spatial trends, and analyze spatial data. For example, spatial association analysis is to find an association rule that says, ‘If the average monthly income is 3 million won or more and there are many sports centers in an area with a high concentration of apartments.’ Spatial cluster analysis generalizes geographical points that are detailed into cluster areas such as commercial, residential, industrial, and agricultural areas according to land use. Spatial classification analysis can be used as an example to classify regions into high-income, low-income, etc. according to average income per household. This type of spatial classification analysis is usually classified in relation to spatial objects such as administrative districts, rivers, and highways. Spatial trend analysis studies changes in spatial data that change over time when there is a variable called time in spatial data and can examine population density according to economic development, etc.