While there are some great packages available, like Elk or DynaMaps, to do the task – sometimes we need some data that is not possible to obtain with them. In this post I am gonna show a method to extract necessary data using Python.
For parsing a XML file I am gonna use a standard Python library called xml.etree.ElementTree. In the example I will extract building boundaries and their floor number. OSM files do sometimes contain precise height data for buildings, however I have found that floor count is found more often.
Typical OSM section that describes a building looks like so:
Sections for objects, such as buildings, are enclosed within a
Another important element of the building section is a
So, in the same XML file we can find a
Important attributes here are “lat” and “lon”, describing geographical coordinates of the point. To make them useful for Dynamo we have to convert them to x,y coordinates. I am not going to elaborate on that part, but a necessary function is included in the final code at the bottom of the post.
The last element of the building section, that is of importance to us, is a
So, knowing all the above, we can start with scripting. As we are going to need node data while parsing buildings, first we will create a node list (actually a dictionary). This way we will be able to locate specific node coordinates easily.
root = xml.parse(file_path).getroot() children = root.getchildren() nodes = {} for c in children: if c.tag == "node": nodes[c.attrib["id"]] = [c.attrib["lat"], c.attrib["lon"]]
The above code is creating a “nodes” dictionary, where node “id” is a dictionary key. Value contains a list of geographical coordinates.
Around this part of the function (in the complete code) I am also parsing a
Finally we can get to the proper part of the code, which extracts building data:
output = [] for c in children: if c.tag == "way": outline_nodes = [] is_building = False levels = 0 for t in c: if t.tag == "tag": if t.attrib["k"] == "building": is_building = True if t.attrib["k"] == "building:levels": levels = int(t.attrib["v"]) if t.tag == "nd": outline_nodes.append(t.attrib["ref"]) if is_building: outline = [] for on in outline_nodes: lat = float(nodes[on][0]) lon = float(nodes[on][1]) xy = ll_to_xy(lat,lon) p = Point.ByCoordinates(xy[0],xy[1]) outline.append(p) output.append([outline,levels])
There are two steps in this loop. Firstly, for each
If the section proves to be a building, in the second part, a data structure describing the building is created. It is a list containing boundary and floor number data. The list is then appended to the output variable.
Below the entire Python code is presented. It takes an OSM file path as an input and outputs a list of buildings, containing extracted data for each one of them.