Detecting Similarities in Posts Using Vector Space and Matrix

This study discusses the application of two linear algebraic materials, namely vector and matrix spaces. The application of the two materials is related to an article, the writing can be in the form of an article, book, and so on. The writings examined in this study use example sentences made by the author. Two materials of linear algebra, namely the vector space and the matrix are used to analyze whether there is a similarity between the writing made with other writing. As a result, vector space and matrix can be used to detect similarities in a text.


Introduction
Linear algebra is widely used in several fields, such as programming, optimization, industry, graphics, libraries (Psarras et al., 2019;Luo et al., 2018;Kirby and Mitchell, 2018). In programming, linear algebra is usually used to initialize something with algebraic variables (Phothilimthana et al., 2019). On the chart, the study of graphs related to various algebraic structures begins by introducing the idea of graphing the zero divisor of the commutative ring of unity (Das, 2017). In addition, graph theory also helps to characterize various algebraic structures by means of studying certain graphs associated to them (Das, 2016b;Dörfler et al., 2018;Sanderson et al., 2019;Zhang and Chen, 2018). Meanwhile, for libraries, linear algebra computations into efficient sequences of library calls Solomonik et al., 2017;Bousse et al., 2018). In this study, linear algebra is used to detect similarities in writing, the writing can be a paper, thesis, and others. The application of linear algebra can be useful for research purposes and can help to make it easier to write.
In general, linear algebra can be used to detect similarities in writing. This detection uses the same linear algebra concept as the working principle of search engines, namely vector and matrix spaces. One of the methods used is the vector space model. The way it works is by implementing a document or writing as a matrix, and the similarity between two documents or two matrices is expressed in terms of the angle between the two vectors. First looking for the frequency of occurrence of words in the document, then calculating the similarity with the document being compared (Sentosa, 2016).
A vector is a geometric object that has both a magnitude and a direction (Sentosa, 2016). Vector space are finite dimensional over a field and dim ( ) nv  (Das, 2016a). In addition to vector space, this study uses cosine similarity. Cosine similarity is used to measure the similarity between two vectors. As a result, in this study it can be seen that vector and matrix spaces can be used to detect similarities in a document or writing. This is the same as the previous research conducted by Sentosa (2016) and can be useful to be developed in further research.

Materials
The writings examined in this study use example sentences made by the author. To compare them, an example sentence in a paper is also used. The following is an example of the first sentence taken by the author in a paper:  There are several papers both on interval matrices and on partial matrices (Rubei, 2020).  The sentence will be compared with the second sentence made by the following author:  There are many papers on interval matrices and other matrices to study  There is one papers both on interval matrices and on partial matrices

Methods
Vector space, matrix, and cosine similarity are used to detect similarities in a text. The following is an explanation of the method to be used.

Vector
A vector is a geometric object that has a quantity and a direction. Each vector can be represented geometrically as a directed line segment in a plane or space. If drawn, the vector is denoted by an arrow ()  . The magnitude of the vector is proportional to the length of the arrow and its direction coincides with the direction of the arrow. Vectors are often marked as ( ⃗⃗⃗⃗⃗ ). While the vector elements are written sequentially or like a one-column matrix or use the unit vector notation , , ⃗ (Sentosa, 2016).
A vector that has a unit length is called a unit vector. Usually the unit vector is used to define direction. To form a unit vector, a vector is divided by the length of the vector.

Vector Space
A vector space is a mathematical structure formed by a set of vectors, namely objects that can be added and multiplied by a number, which are called scalars. An example of a vector space is the Euclidean vector which is often used to represent physical quantities such as forces. The vector space model is a basic technique in obtaining information that can be used for research on the relevance of documents against search keywords (query) on search engines, document clarification, document grouping, information retrieval systems, and others (Sentosa, 2016).

Cosine Similarity
Cosine similarity is used to measure the similarity between two vectors. Cosine similarity is the result of the cosine of the angle between the two vectors. Can be formulated as follows (Sentosa, 2016).

Result and Discussion
The sample sentence taken from the paper is counted the total number of words as well as the same words can be seen in Table 1. After that, Table 2 and Table 3 are also made for other examples as follows After that put into equation (1) (1.

 
From the results of these calculations, the two sentences that are compared with the sentences taken from the paper, each have a different angle. The angle formed between the first sentence and the second sentence is 29.5919. Meanwhile, the angle formed between the first sentence and the third sentence is 20.7056. If the angle formed between the subspaces has a small value, it means that the two sentences have a high similarity. On the other hand, if the angle formed between subspaces has a large value, it means that the two sentences have a low similarity (Sentosa 2016).

Conclusion
In this study, a calculation trial was carried out on sentences taken from a paper, then compared with two sentences made by the author. The calculation involves linear algebra using vectors and vector spaces. As a result, two sentences are compared with the sentences taken from the paper, each with a different angle. The angle formed between the first sentence and the second sentence is 29.5919. Meanwhile, the angle formed between the first sentence and the third sentence is 20.7056. Thus, the angle produced by the first sentence and the second sentence has a larger angle than the angle formed by the first and third sentences. This means that the first sentence and the third sentence have a higher similarity than the first sentence and the second sentence. This is in line with manual calculations that you do yourself by seeing which words are more in the sentence being compared. The first sentence and the second sentence have only the same eight words while the first sentence and the third sentence have the same nine words. So the vector space and matrix can be used to detect similarities in a sentence compared to other sentences.