Non-perfect maze generation using Kruskal algorithm

. A non-perfect maze is a maze that contains loop or cycle and has no isolated cell. A non-perfect maze is an alternative to obtain a maze that cannot be satisfied by a perfect maze. This paper discusses non-perfect maze generation with two kind of biases, namely, horizontal and vertical wall bias and cycle bias. In this research, a maze was modeled as a graph in order to generate a non-perfect maze using Kruskal algorithm modifications. The modified Kruskal algorithm used Fisher Yates algorithm to obtain a random edge sequence and disjoint set data structure to reduce processing time of the algorithm. The modifications mentioned above were adding edges randomly while taking account of the edge’s orientation, and adding additional edges after a spanning tree is formed. The algorithm designed in this research constructs a 𝑚 × 𝑛 non-perfect maze with complexity of 𝑂(𝐸 𝑙𝑜𝑔 𝑉) where 𝑉 and 𝐸 denote vertex and edge set of a 𝑚 × 𝑛 grid graph, respectively. Several biased non-perfect mazes were shown in this research by varying their dimension, wall bias and cycle bias.


INTRODUCTION
Maze is a puzzle consisting of several passages structured in an intricate manner. Solving a maze means the player is required to find a route that connects two points inside the maze [1]. Using technology like a computer, a maze can also become an arena of a game [2,3]. In addition to that, a maze is also used in several fields especially as cognitive exercise and education. For example, a maze was used to observe animal's behavior [4,5] or as cognitive exercises for children [6,7].
Generally, a maze is a rectangular-shaped object resembling a grid with entry and exit point located in opposite side of the maze [8]. However, a maze can also be circularly-shaped [8,9]. It was also noted that Pullen classified a maze into seven categories, one of which is routing [10]. The routing category classifies a maze based on its passage pattern and structure, one of which is a perfect maze. A perfect maze is a maze which does not contain any loop or cycle, has no isolated point or cell and there is exactly one path connecting any two points [1]. On the other hand, a non-perfect maze is a maze that does not contain isolated point and has two points connected by more than one path [8]. In other words, a non-perfect maze is a maze which contains a cycle [11]. Additionally, it was also noted in [10] that a maze is also classified by its texture. For example, a maze containing relatively more horizontal paths than vertical paths is said to have horizontally-bias texture.

Figure 1. A maze represented in a graph
A maze can be viewed as a graph as seen in Figure 1. A grid-shaped maze is composed of collection of cells in which a cell denotes a vertex and having no wall between two cells denotes an edge between the associated vertices [1]. Furthermore, a maze can be seen as a subgraph of a grid graph. A perfect maze can then be viewed as a spanning tree of a grid graph. A perfect maze, like a spanning tree, does not contain any isolated cell and has each of its cell connected by exactly one path. This *Corresponding Author: marwan.math@unsyiah.ac.id is consistent with the nature of spanning tree, which is connected and has exactly one path between any two vertices [1]. A non-perfect maze is a perfect maze with at least one wall removed. Removing a wall is equivalent to adding an edge to the graph. In the context of graphs, adding an edge to a spanning tree will create a cycle, which is identical to a nonperfect maze. Because of this structural similarity, graph-based algorithm can be used for maze generation problem.
Maze generation is a problem that has been researched since several decades ago. For example, Walter Pullen used a computer to generate the largest maze in 1987. His maze has size of 23 x 11 feet and requires 688 sheets of paper to print [12]. Several methods have been developed to generate a maze, even using a computer. These methods can accommodate several characteristics of maze [11]. These characteristics include the geometrical shape, perfectness and bias. A perfect maze can be generated using several algorithms, such as DFS, Growing Tree, Recursive Backtracking, Prim and Kruskal.
As mentioned above, a non-perfect maze can be obtained by deleting a wall of a perfect maze. As a result, any two cells in a non-perfect maze can be connected by one or more paths. Another consequence is that it is possible to find 4 adjacent cells which create a cycle in perfect maze. If we applied a non-perfect maze as a game arena, then it is possible to place an object (e.g., bomb) inside the 4-cells without blocking the characters' path. In short, a non-perfect can be an alternative if a perfect maze cannot accommodate certain rules or gameplays in a maze game. This paper aims to introduce an algorithm to generate a biased non-perfect maze by modifying Kruskal algorithm. There are two kind of bias accommodated in this paper, namely, horizontal/vertical wall bias and cycle bias. A maze with high ratio of horizontal bias will have relatively more horizontal paths compared to vertical ones, vice versa. In addition to that, the use of cycle bias will regulate the amount of cycle contained in the maze. Afterwards, the algorithm complexity is analyzed. This paper is structured into several sections starting from introduction followed by perfect and non-perfect maze related works. The research methodology is in the next section following with result and discussion. Finally, conclusion is included in the last section.

Related works
Several studies on maze generation have been done before. Dubey and Sarita [13] used Depth First Search (DFS) algorithm to generate a perfect maze. Their research used stack data structure to remove walls in a grid to create a perfect maze. Singh et al. [14] did a research to optimize a perfect maze generation using stack and disjoint set data structure that is able to generate a perfect maze in shorter amount of time than the general disjoint set method.
Another method to generate a perfect maze is Wilson algorithm [11]. The research compared several maze solution-finding methods such as genetic algorithm, DFS and BFS. The results of this research showed that genetic algorithm worked better on a maze with high level of openness. Hendrawan [15] used Growing Tree algorithm to develop an Android game based on a maze. Perfect maze generation can also be done using recursive methods such as recursive backtracking [16] and recursive division [17]. Perhaps the closest maze generation method to this study is Prim and Kruskal algorithm. Shah et al. [18] used DFS, Prim and Kruskal algorithm, which describe three conceptually different approaches for generating a maze.
Based on some of the researches mentioned above, it can be seen that maze generation, especially perfect maze, is a problem that has been widely studied and has led to several methods using graph-based algorithm. However, most of these studies did not consider bias and create a random perfect or non-perfect maze. We view that having a controlled bias in a maze can be beneficial and practical in some use cases (e.g., in a game, mazes with more open areas are used in low levels while those with fewer open areas are used in high levels). Using the same general idea and openness (or cycle) bias defined in [11], a perfect maze generation algorithm (i.e., Kruskal algorithm) can be extended to a non-perfect maze generation algorithm. The extension allows the algorithm to create a non-perfect maze with certain level of openness. Moreover, the algorithm can also create a perfect maze by setting openness level to its lowest value. We also add horizontal/vertical bias to allow more controls of the mazes. In short, this research tries to modify Kruskal algorithm to produce a non-perfect maze that satisfy horizontal/vertical wall bias and cycle bias in order to adjust the characteristics of the maze.

METHODOLOGY
The structural similarity between a perfect maze and a spanning tree allows graph-based algorithm to be implemented as a perfect maze generation method. One of the algorithms used to generate minimum spanning tree is Kruskal algorithm. The algorithm works on a weighted graph, which is a graph whose edges have weight. Kruskal algorithm works iteratively by picking an edge from the edge set and adding it to the set of selected edges to create a minimum spanning tree. The order of edge selection refers to the order of edge based on the weights, nondecreasingly. Cormen et al. [19] implemented Kruskal algorithm with disjoint set data structure, resulting overall complexity of ( ). The algorithm uses MAKE-SET, FIND-SET and UNION function. MAKE-SET is a function to create distinct disjoint set for each vertex. FIND-SET is a function to find a set which contains a vertex, while UNION is used to join two disjoint sets containing two vertices [19]. The procedure of Kruskal algorithm is described in Algorithm 1.

Algorithm 1: Kruskal Algorithm
sort the edges of G.E into nondecreasing order by weight w 5: for each edge ( , ) ∈ . , taken in nondecreasing order by weight 6: If FIND-SET( ) != FIND-SET( ) 7: The set in Algorithm 1 is the set of edges contained in the spanning tree. At first, the set is initialized as an empty set, which means no edge has been selected yet. In line 2-3, MAKE-SET function initializes disjoint set for each vertex in . . In line 4, each edge will be sorted nondecreasingly by its weight in order to get a nondecreasing edge sequence, so that the selection process is started from the first element. In line 5-8, FIND-SET function checks whether vertex and , which both are incident to edge ( , ), belong to the same disjoint set. If yes, then the edge will not be added to set since it will create a cycle. Otherwise, UNION function will join two disjoint sets which contain and and edge ( , ) will be added to A. Kruskal algorithm ends after a minimum spanning tree is obtained.
In this research, a non-perfect maze was formed by first creating a perfect maze using Kruskal algorithm and then followed by removing walls of the maze. This process was done by modeling a maze as a graph. If maze has size of × where is the number of rows and is the number of columns, then its equivalent graph is a subgraph of grid graph × where the number of vertices in is . The number of cells in a maze corresponds to the number of vertices and the number of walls depends on the number of edges. The number of vertices of is while the number of possible edges is This result concludes that, in a non-perfect maze, the inequalities 0 ≤ ≤ − − must hold. This result was used on cycle bias calculation discussed in the next section.
Kruskal algorithm modifications for a nonperfect maze generation was done by adding some parts and changing some process in the original algorithm. The first modification is adding the additional edges after a spanning tree is formed. This process was done by adding yet unselected edges to set . In this study, such edges were collected in the set = . − . This creates a connected graph with cycles, which is equivalent to a non-perfect maze.
Next step is introducing bias of a maze. As mentioned earlier, the two biases refer here are horizontal/vertical bias and cycle bias. The implementation of horizontal/vertical wall bias was done by calculating the number of active walls of the maze ( ). If ℎ and ( = 1 − ℎ ) are the ratio of horizontal walls relative to the number of walls and the ratio of vertical walls relative to the number of walls, respectively, then the number of horizontal walls and the number of vertical walls can be written as The round function in equation (1) was intended so that ℎ and are both integers. If ℎ and denote, respectively, the number of horizontal edges and vertical ones, then: Based on the result obtained, the number of horizontal walls and vertical walls of that satisfy wall ratio ℎ and can be calculated using equations (2) and (3).
Next step is introducing the cycle bias of the maze. Cycle bias determines the number of walls removed, which is equal to the number of edges added. In this case, the value is the we have previously defined. The addition of edges to a spanning tree is equivalent to creating cycle bases. A cycle basis is the cycle that cannot be formed using other cycles. In this study, the cycle bias is determined by first calculating the number of possible cycle bases in graph . In a connected graph = ( , ), the number of possible cycle bases is | | − | | + 1. As a result, the maximal number of cycle bases of is 2 − − − ( ) + 1 = − − + 1, which corresponds to maximal value of . Consequently, if denotes the ratio of cycle bases relative to the number of maximum cycle bases, then: where 0 < ≤ 1. The ceiling function in equation (4) was intended so that is positive integer and > 0.
In Kruskal algorithm, there exists an edge sorting process based on the edge's weight.
Adding an edge in Kruskal algorithm is done based on the order of the edge obtained after the sorting. This is possible because the graph Kruskal algorithm works on is a weighted graph. However, in this study, is not a weighted graph. Therefore, a method is needed to solve this problem. The technique used in this research was done by shuffling the edges of using Fisher Yates algorithm. Fisher Yates algorithm shuffles a sequence with complexity of ( ) where is the number of elements needed to be shuffled. From the shuffled sequence of edges, we select − 1 edges just like in Kruskal algorithm, and select additional edges by taking account of the bias, so that we will obtain a connected graph with cycle bases.
In this article, the addition of edges was done by referring to the order of the edge on the shuffled-sequence while taking account of the number of horizontal and vertical walls that must be present in the maze. However, not all shuffled-sequence can be used as reference of edges addition. This is because some sequences produce a disconnected graph. This case is illustrated in Figure 1. , it did not produce a connected graph. This is due to the fact that some columns of the graph have not been connected despite the graph satisfying the number of horizontal edges. The addition of 0 , 2 and 4 satisfy the number of horizontal edges (3 edges), but these three edges only connect first and second column of the graph. Therefore, a method is needed to solve this problem. In this study, the algorithm added edges between two adjacent columns and rows to ensure that each shuffled sequence of edges can produce a connected graph. Edges that connect adjacent rows and columns are selected to be added to the graph before adding sequence-based edges started.
In general, non-perfect maze generation algorithm produced in this study was formed by removing sorting process with shuffling edges in order to obtain a non-perfect maze. Just like Kruskal algorithm, each edge ( , ) was added if and belong to different disjoint sets. At this point, an additional condition was introduced to determine if an edge is valid or not to be added. The edge, whose endpoints belong to different disjoint set, would only be added if such addition does not violate the number of horizontal and vertical edges needed to satisfy the wall bias. These additions of edges eventually formed a spanning tree . The next step was to add a number of edges to form cycles. At this stage, adding an edge did not take account of disjoint set partition of the endpoints, but only taking account of the number of horizontal and vertical walls. The overview of the algorithm can be found in Figure 3.

RESULTS AND DISCUSSION
The algorithm and the results are discussed in this section. This section also includes several categories of non-perfect mazes by varying the value of , and bias. For the first category, non-perfect mazes are generated by varying the value of and while keeping bias value fixed. Afterwards, the second category shows maze with fixed value of , , and cycle bias while varying wall bias. Finally, the third category shows mazes with fixed value of , , and wall bias while varying cycle bias. The algorithm produced based on the methodology mentioned earlier can be seen in Algorithm 2. In the algorithm, three helper functions were used, namely, SHUFFLE-EDGE-SEQUENCE, ADD-REQUIRED-EDGE and CHECK-EDGE. SHUFFLE-EDGE-SEQUENCE is used to shuffle sequence of edges using Fisher Yates algorithm. The addition of edges between adjacent columns and rows is done by ADD-REQUIRED-EDGE while iterative addition of edges is done in CHECK-EDGE. Line 1-9 is initialization and calculation of several desired maze parameters. The for loop in line 10-11 is initialization of disjoint set for each vertex. After edge shuffling in line 12, we add edges between two adjacent columns and rows in line 13. The iteration for in line 14 is started to form a spanning tree. In this iteration, each edge is checked if its endpoints belong to different disjoint set using FIND-SET. If yes, the edge is checked again using CHECK-EDGE. An edge will only be added if its addition will not violate the bias. The UNION is used to join two disjoint sets, which contains the endpoints of an edge. After the spanning tree is formed, we add more edges so that the graph will contain cycles. This addition of edges will be done by for loop in line 17-18 using CHECK-EDGE. After the loop is done, the set contains all edges belonging to and the algorithm is now done.
FIND-SET and UNION in disjoint set data structure with union by rank both have complexity of (log ) where is the number of elements in disjoint set. As with the algorithm, FIND-SET and UNION both have complexity of (log ) where is the number of vertices. Therefore, CHECK-EDGE has complexity of (log ) and ADD-REQUIRED-EDGE of (( + ) log ). The time complexity needed for parameters calculation in line 1-9 is (1). Disjoint set initialization of times in line 10-11 takes ( ). SHUFFLE-EDGE-SEQUENCE with Fisher Yates algorithm requires complexity of ( ). The loop in line 14-16 is done times so the total complexity of Find-Set is (E log ) and CHECK-EDGE is done − 1 times so total complexity CHECK-EDGE is (V log ). The addition of edges in loop line 17-18 takes complexity of ( log ). Thus, the overall complexity of the algorithm is (1 + + + ( + ) log + E log + V log + log ) = (E log + V log ).
Since | | > | |, the complexity of the algorithm is (E log ).
Several mazes produced by the algorithm were shown below. Table 1 showed non-perfect mazes by varying its dimension and fixing the bias. The bias chosen here was 50% horizontal walls and 50% vertical walls with cycle bias 10% Table 1  Based on Table 1, it could be inferred that the non-perfect mazes generated had more passages as its size increases. This increase in size caused a maze to have more walls, so that the wall ratio was getting closer to what we desired.
Furthermore, several mazes were generated by varying wall bias. In this case, we varied wall bias 0-100% with an increment of 10%. All the mazes have fixed dimension of 4 × 5 with fixed cycle bias of 10%. The mazes generated with these variations were shown in Table 2.  Table 2 showed that low horizontal bias generated a maze with low number of horizontal walls, vice versa. A 0% horizontal bias generated a maze with no horizontal walls, while a 100% generated a maze with no vertical walls.
The last parameter variation was cycle bias. The maze was generated by varying cycle bias and fixing its dimension and wall bias. The dimension we chose was 4 × 5 with 50% horizontal wall and 50% vertical wall. The variations of cycle were done by 10% increment. The results were shown in Table 3.   Table 3, it was seen that cycle bias of 0% generated a perfect maze, which is identical to a spanning tree , while cycle bias of 100% generated a maze with no walls, which is identical to a grid graph . We could also see that having greater cycle bias resulted in having more cycles in . In general, the value of cycle bias determines the level of openness we described in the introduction section.
The three tables above only showed the result of variations of each parameter in non-perfect maze. However, these three parameters could be combined, resulting the non-perfect maze we desired.

CONCLUSION
Non-perfect maze generation is an interesting research when it includes the addition of several characteristics such as the size, the percentage of horizontal/vertical wall and the percentage of cycle. This study introduced an algorithm to generate an × non-perfect maze with the influence of horizontal/vertical wall bias and cycle bias and has overall complexity of (E log ) where and denotes, respectively, the set of vertices and edges of a grid graph × . Possible further works includes reducing the time complexity, using another algorithm or introducing other biases and characteristics in a maze.