Floorplans begin their lifecycle as vector files since designers and architects usually produce architectural floorplans using Sketchup, HomeStyler, and AutoCAD, which are vector-based applications. However, incorporating clients’ input in the design and approval process requires the rasterization of these vector files by printing or publishing them.
The Motivation for the Study
While the prints help clients visualize the plans, there’s a downside. The rasterization leads to the loss of vital, structured geometric, and semantic information. The loss of this information makes it impossible to analyze, synthesize, or modify the models. Furthermore, the creator cannot even post-process the now raster floorplans.
These issues are further compounded by the difficulty in recovering this lost information from the raster floorplan image. For a long time, this has been a problem plaguing players in the architectural industry. As if that’s not enough, the issue poses two additional challenges. For one, a floorplan must fulfill advanced semantic and geometric parameters. An example of these parameters is that the walls defining the external boundary or even certain rooms must form a closed 1D loop. The second challenge is that these advanced parameters vary from one drawing to another because houses have different numbers of rooms. However, solving these challenges depends on recovering this information in the first place, which was somewhat of a tall order.
Thus, in their study, Chen Liu, Jiajun Wu, Pushmeet Kohli, and Yasutaka Furukawa proposed a solution to the initial data loss problem, which would then resolve the resultant conundrum. They acknowledged the existence of techniques aimed at addressing the raster floorplan conversion to vector issues using a combination of low-level image processing trial-and-error methods. However, these methods couldn’t tackle the problems fully.
Instead of a trial-and-error approach, Liu and colleagues employed a learning-based method. The results revealed that their novel methodology was a step in the right direction, particularly because their model (algorithm) outperformed the existing models and had 90% precision and recall scores. These scores implied that their model created wall-junctions, rooms, icon primitives (junctions), and opening primitives (windows and doors), whose variations, when compared to annotations/corrections made by production-level conversion tools and human subjects, were minimal.
As the research article’s title “Raster-to-Vector: Revisiting Floorplan Transformation” suggests, the researchers did revisit vectorization. They dealt with the problems emanating from rasterizing vector floorplans. Let’s see how they did it.
Stratified Floorplan Representation
Liu et al. approached the raster floorplan conversion to vector process one step at a time, as shown in the image below. First, they converted the input raster image of a floorplan into a junction layer. This was done with the help of the Convolutional Neural Network (CNN), a deep learning algorithm used to differentiate objects within an image when analyzing it. They then used integer programming (IP) to convert the junction layer to a primitive layer. But what are these layers in the first place?
A house is made up of walls that meet at different orientations. For instance, at a corner, the orientation is L-shaped. On the other hand, in scenarios where a wall partitions a room, creating two separate rooms, a T-shaped orientation or junction is formed. Thus, the junction layer comprises I-shape, L-shaped, X-shaped, and T-shaped wall junctions, as shown in the image below.
This layer also includes representations for icons such as the toilet, washing basin, cooking counter, and bathtub. An axis-aligned bounding box represents the icons for these components. Further, to represent openings, this layer has opening junctions at two end-points. In the Stratified Floorplan Representation image above, these junctions are two arrows pointing towards each other.
Additionally, the junction layer consists of two forms of pixel probability distribution (PDF) maps (per-pixel classification). The first represents different room types, while the second shows the icon types that represent installations such as toilets and bathtubs.
This second layer encodes the icons (installations), openings, and walls as primitives. It relies on the Integer Programming (IP) process to connect the junctions, forming the primitives, which are essentially lines. The icon primitive is represented as a box while walls and openings are displayed as lines with arrows at each end.
These primitives must fulfill certain parameters that ultimately ease the post-processing step – the step that results in the final vector floorplan. For instance, for the bedroom walls, the lines must create a closed-loop and they must have the same label/annotation on one side. The latter applies to all the rooms in the floorplan.
Raster to Vector Conversion
The vectorization process that Liu and friends used consists of three steps:
- Convolutional Neural Network (CNN) that converted the raster file to junctions
- Integer Programming (IP) process that combined the junctions, resulting in primitives.
Convolutional Neural Network (CNN)
The researchers’ incorporation of the junction layer and the per-pixel classification of the various elements in the floorplan made the application of CNNs incredibly easy. The neural network, which they borrowed from a previous study, helped them develop heatmaps at the pixel level. Simply put, the heatmaps were created in sections of the floorplan where the junctions were.
Liu et al. then cropped and color jittered the resultant image to improve its overall appearance. Further, they used the threshold function, with the threshold figure being 0.4. This figure was chosen because it was slightly lower than 0.5 and would allow the IP process to pick out the exact positions of the junctions.
CNN made excellent predictions regarding the junctions’ positions. However, it fell short in some ways because it misclassified some types of junctions. For instance, it would classify an L-shaped corner as T-shaped. To this end, the researchers designed the neural network to predict more than one junction type at all times. How so? If it detected a T-shaped junction, it would identify a T-shaped intersection, two L-shaped junctions, and a single X-shaped junction in that same location. This was meant to prompt the IP to correct the intentional error since it was programmed to use just one junction type per location.
The deep learning algorithm resulted in an accurate network of junctions that could then be used to extract primitives. The extraction of the primitives was a function of the integer programming process. The IP applied several semantic and geometric constraints, thereby filtering out fake lines. It also ensured that various floorplan properties were maintained throughout the conversion process.
For instance, a bedroom must consist of several walls that enclose a space, creating a 1D loop. Further, the bedroom type must be correctly identified based on its location relative to the wall primitive. It is the combination of all these processes that eliminated the additional junction types that the CNN had ‘intentionally’ identified.
The IP filtered the primitives out using a total of 5 constraints, namely:
- Connectivity constraint
- One-hot encoding constraint
- Mutual exclusion constraint
- Opening constraint
- Loop constraint
The loop constraint ensured that the walls for each of the rooms in the floorplan formed a closed loop. The one-hot encoding constraint identified a primitive, or a lack thereof, using binary denotations – one denoted the existence of a wall or icon primitive. On the other hand, zero meant that these elements weren’t present. The mutual exclusion constraint prevented two extremely close primitives from being chosen. This parameter ensured that the model didn’t select fake junctions.
The connectivity constraint governed the relationship between primitives and junctions. It stated that the number of connections at a junction should only correspond to the number of primitives that join together. Like the mutual exclusion constraint, this fourth parameter eliminated the fake intersections. Lastly, the opening constraint ensured that every wall had an opening.
This last step entailed correcting a few errors. Liu et al. intentionally included some of these errors, e.g., the coordinate error, because they wanted them to exist as part of the IP output, which they would ultimately correct during this last stage. The correction process entailed moving the primitives to their right positions, creating closed polygons that defined the rooms’ sizes.
Recall the requirement that every room must have the same annotation on one side? The researchers used these annotations while naming the rooms. They identified each room based on each wall’s labels. If all walls defining the closed polygon had the same label, the researchers used this label as the room’s name. In instances where a wall had several titles, they split the closed polygon using a horizontal or vertical line. They then named these new sub-regions using the walls’ label, which was now the same for all the walls.
Liu et al. used a total of 870 floorplan images. At first, they had selected 1,000 images, at random, from a dataset containing about 5 million floorplan raster images. They then asked human subjects to include geometric and semantic information for each floorplan on the image. The subjects did this by either labeling the rooms or drawing a line representing a wall or an opening and a rectangle to represent objects in the floorplan.
The researchers then converted these new inclusions into the inputs that their algorithms could recognize. They followed this up by verifying the annotations manually. They eliminated 130 poor-quality images. Of the remaining 870 images, they used 770 to train the deep learning models and then tested the algorithms’ effectiveness using the outstanding 100 images. Importantly, they tested the efficacy of all elements of their model including CNN and the IP. With regards to the latter, they based their assessment on the 5 constraints detailed above.
The IP improved the precision significantly when compared to conversion methods that didn’t utilize IP. This precision is a measure of how much the elements of the researchers’ vectorized floorplans differed from the exact position of the walls, openings, icons, and the room as obtained using production-level conversion tools and human subjects. The precision figures ranged between 84% (icon) and 94.7% (wall junction). Indeed, the algorithms Liu and colleagues developed came incredibly close to production-level models.
An analysis of the research article gives a few pointers on raster floorplan conversion to vector and, by extension, general vectorization. For one, the raster image quality is a vital determinant of whether the vectorization process will be smooth or not. So crucial is the quality factor that Liu et al. had to do away with 130 raster images from an initial sample of 1,000 images.
Secondly, rasterizing vector files results in the loss of certain vital information pertaining to semantics and geometry. This, therefore, poses a challenge whenever there is a need to vectorize already rasterized files. The methods that researchers whose studies predate Liu and friends’ came up with fell far behind what human annotators could do in terms of recovering the lost information.
Thus, in their study, Liu and colleagues proposed a new model that achieved a 90% precision score. The results proved that the novel approach was closer to the score achieved by production-level conversion tools. Nonetheless, there’s a need for additional input on the novel approach to ensure that the resultant model has a precision of 100%. In the meantime, you can use Scan2CAD, our production-level vectorization software (even considering low-resolution images). It’s capable of converting architectural drawings in addition to other types of vector images.