aio_overpass

Async client for the Overpass API.

Release Notes

Examples

Usage

There are three basic steps to fetch the spatial data you need:

Formulate a query
- Either write your own custom query, f.e. Query("node(5369192667); out;"),
- or use one of the Query subclasses, f.e. SingleRouteQuery(relation_id=1643324).
Call the Overpass API
- Prepare your client with client = Client(user_agent=...).
- Use await client.run_query(query) to fetch the result set.
Collect results
- Either access the raw result dictionaries with query.result_set,
- or use a collector, f.e. collect_elements(query) to get a list of typed Elements.
- Collectors are often specific to queries - collect_routes requires a RouteQuery, for instance.

Example: looking up a building in Hamburg

a) Results as Dictionaries

You may use the .result_set property to get a list of all query results without any extra processing:

from aio_overpass import Client, Query

query = Query('way["addr:housename"=Elbphilharmonie]; out geom;')

client = Client()

await client.run_query(query)

query.result_set

[
      {
          "type": "way",
          "id": 24981342,
          # ...
          "tags": {
              "addr:city": "Hamburg",
              "addr:country": "DE",
              "addr:housename": "Elbphilharmonie",
              # ...
          },
      }
]

b) Results as Objects

This will give you a user-friendly Python interface for nodes, ways, and relations. Here we use the .tags property:

from aio_overpass.element import collect_elements

elems = collect_elements(query)

elems[0].tags

{
    "addr:city": "Hamburg",
    "addr:country": "DE",
    "addr:housename": "Elbphilharmonie",
    # ...
}

c) Results as GeoJSON

The processed elements can also easily be converted to GeoJSON:

import json

json.dumps(elems[0].geojson, indent=4)

{
    "type": "Feature",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    9.9832434,
                    53.5415472
                ],
                ...
            ]
        ]
    },
    "properties": {
        "id": 24981342,
        "type": "way",
        "tags": {
            "addr:city": "Hamburg",
            "addr:country": "DE",
            "addr:housename": "Elbphilharmonie",
            ...
        },
        ...
    },
    "bbox": [
        9.9832434,
        53.540877,
        9.9849674
        53.5416212,
    ]
}

Choosing Extras

This library can be installed with a number of optional extras.

Install no extras, if you're fine with dict result sets.
Install the shapely extra, if you would like the convenience of typed OSM elements. It is also useful if you are interested in elements' geometries, and either already use Shapely, or want a simple way to export GeoJSON.
- This includes the pt module to make it easier to interact with public transportation routes. Something seemingly trivial like listing the stops of a route can have unexpected pitfalls, since stops can have multiple route members, and may have a range of different tags and roles. This submodule will clean up the relation data for you.
Install the networkx extra to enable the pt_ordered module, if you want a route's path as a simple line from A to B. It is hard to do this consistently, mainly because ways are not always ordered, and stop positions might be missing. You can benefit from this submodule if you wish to
- render a route's path between any two stops
- measure the route's travelled distance between any two stops
- validate the order of ways in the relation
- check if the route relation has gaps
Install the joblib extra to speed up pt_ordered.collect_ordered_routes(), which can benefit greatly from parallelization.

Coordinates

Geographic point locations are expressed by latitude (lat) and longitude (lon) coordinates.
- Latitude is given as an angle that ranges from –90° at the south pole to 90° at the north pole, with 0° at the Equator.
- Longitude is given as an angle ranging from 0° at the Prime Meridian (the line that divides the globe into Eastern and Western hemispheres), to +180° eastward and −180° westward.
- lat/lon values are floats that are exactly those degrees, just without the ° sign.
This might help you remember which coordinate is which:
- If you think of a world map, usually it’s a rectangle.
- The long side (the largest side) is the longitude.
- Longitude is the x-axis, and latitude is the y-axis.
Be wary of coordinate order:
- The Overpass API explicitly names the coordinates: { "lat": 50.2726005, "lon": 10.9521885 }
- Shapely geometries returned by this library use lat/lon order, which is the order stated by ISO 6709, and seems like the most common order.
- GeoJSON, on the other hand, uses lon/lat order.
OpenStreetMap uses the WGS84 spatial reference system used by the Global Positioning System (GPS).
OpenStreetMap node coordinates have seven decimal places, which gives them centimetric precision. However, the position accuracy of GPS data is only about 10m. A reasonable display accuracy could be five places, which is precise to 1.1 metres at the equator.
Spatial features that cross the 180th meridian are problematic, since you go from longitude 180.0 to -180.0. Such features usually have their geometries split up, like the area of Russia.

View Source

 1"""
 2Async client for the Overpass API.
 3
 4[Release Notes](https://github.com/timwie/aio-overpass/blob/main/RELEASES.md)
 5
 6[Examples](https://github.com/timwie/aio-overpass/tree/main/examples)
 7"""
 8
 9import importlib.metadata
10from pathlib import Path
11
12
13__version__: str = importlib.metadata.version("aio-overpass")
14
15# we add this to all modules for pdoc;
16# see https://pdoc.dev/docs/pdoc.html#use-numpydoc-or-google-docstrings
17__docformat__ = "google"
18
19# we also use __all__ in all modules for pdoc; this lets us control the order
20__all__ = (
21    "__version__",
22    "Client",
23    "ClientError",
24    "Query",
25    "client",
26    "element",  # pyright: ignore[reportUnsupportedDunderAll]
27    "error",
28    "pt",  # pyright: ignore[reportUnsupportedDunderAll]
29    "pt_ordered",  # pyright: ignore[reportUnsupportedDunderAll]
30    "ql",  # pyright: ignore[reportUnsupportedDunderAll]
31    "query",
32)
33
34from .client import Client
35from .error import ClientError
36from .query import Query
37
38
39# extend the module's docstring
40for filename in ("usage.md", "extras.md", "coordinates.md"):
41    __doc__ += "\n<br>\n"
42    __doc__ += (Path(__file__).parent / "doc" / filename).read_text()

__version__: str = '0.13.1'

class Client: View Source

 79class Client:
 80    """
 81    A client for the Overpass API.
 82
 83    Requests are rate-limited according to the configured number of slots per IP for the specified
 84    API server. By default, queries are retried whenever the server is too busy, or the rate limit
 85    was exceeded. Custom query runners can be used to implement your own retry strategy.
 86
 87    Args:
 88        url: The url of an Overpass API instance. Defaults to the main Overpass API instance.
 89        user_agent: A string used for the User-Agent header. It is good practice to provide a string
 90                    that identifies your application, and includes a way to contact you (f.e. an
 91                    e-mail, or a link to a repository). This is important if you make too many
 92                    requests, or queries that require a lot of resources.
 93        concurrency: The maximum number of simultaneous connections. In practice the amount
 94                     of concurrent queries may be limited by the number of slots it provides for
 95                     each IP.
 96        status_timeout_secs: If set, status requests to the Overpass API will time out after
 97                             this duration in seconds. Defaults to no timeout.
 98        runner: You can provide another query runner if you want to implement your own retry
 99                strategy.
100
101    References:
102        - https://wiki.openstreetmap.org/wiki/Overpass_API#Public_Overpass_API_instances
103    """
104
105    __slots__ = (
106        "_concurrency",
107        "_maybe_session",
108        "_runner",
109        "_status_timeout_secs",
110        "_url",
111        "_user_agent",
112    )
113
114    def __init__(
115        self,
116        url: str = DEFAULT_INSTANCE,
117        user_agent: str = DEFAULT_USER_AGENT,
118        concurrency: int = 32,
119        status_timeout_secs: float | None = None,
120        runner: QueryRunner | None = None,
121    ) -> None:
122        if concurrency <= 0:
123            msg = "'concurrency' must be > 0"
124            raise ValueError(msg)
125        if status_timeout_secs is not None and status_timeout_secs <= 0.0:
126            msg = "'status_timeout_secs' must be > 0"
127            raise ValueError(msg)
128
129        self._url = url
130        self._user_agent = user_agent
131        self._concurrency = concurrency
132        self._status_timeout_secs = status_timeout_secs
133        self._runner = runner or DefaultQueryRunner()
134
135        self._maybe_session: aiohttp.ClientSession | None = None
136
137    def _session(self) -> aiohttp.ClientSession:
138        """The session used for all requests of this client."""
139        if not self._maybe_session or self._maybe_session.closed:
140            headers = {"User-Agent": self._user_agent}
141            connector = aiohttp.TCPConnector(limit=self._concurrency)
142            self._maybe_session = aiohttp.ClientSession(headers=headers, connector=connector)
143
144        return self._maybe_session
145
146    async def close(self) -> None:
147        """Cancel all running queries and close the underlying session."""
148        if self._maybe_session and not self._maybe_session.closed:
149            # do not care if this fails
150            with suppress(CallError):
151                _ = await self.cancel_queries()
152
153            # is raised when there are still active queries. that's ok
154            with suppress(aiohttp.ServerDisconnectedError):
155                await self._maybe_session.close()
156
157    async def _status(self, timeout: ClientTimeout | None = None) -> "Status":
158        endpoint = urljoin(self._url, "status")
159        timeout = timeout or aiohttp.ClientTimeout(total=self._status_timeout_secs)
160        async with (
161            _map_request_error(timeout),
162            self._session().get(url=endpoint, timeout=timeout) as response,
163        ):
164            return await _parse_status(response)
165
166    async def status(self) -> Status:
167        """
168        Check the current API status.
169
170        The timeout of this request is configured with the ``status_timeout_secs`` argument.
171
172        Raises:
173            ClientError: if the status could not be looked up
174        """
175        return await self._status()
176
177    async def cancel_queries(self, timeout_secs: float | None = None) -> int:
178        """
179        Cancel all running queries.
180
181        This can be used to terminate runaway queries that prevent you from sending new ones.
182
183        Returns:
184            the number of terminated queries
185
186        Raises:
187            ClientError: if the request to cancel queries failed
188        """
189        timeout = aiohttp.ClientTimeout(total=timeout_secs) if timeout_secs else None
190        headers = {"User-Agent": self._user_agent}
191        endpoint = urljoin(self._url, "kill_my_queries")
192
193        # use a new session here to get around our concurrency limit
194        async with (
195            aiohttp.ClientSession(headers=headers) as session,
196            _map_request_error(timeout),
197            session.get(endpoint, timeout=timeout) as response,
198        ):
199            body = await response.text()
200            killed_pids = re.findall("\\(pid (\\d+)\\)", body)
201            return len(set(killed_pids))
202
203    async def run_query(self, query: Query, *, raise_on_failure: bool = True) -> None:
204        """
205        Send a query to the API, and await its completion.
206
207        "Running" the query entails acquiring a connection from the pool, the query requests
208        themselves (which may be retried), status requests when the server is busy,
209        and cooldown periods.
210
211        The query runner is invoked before every try, and once after the last try.
212
213        To run multiple queries concurrently, wrap the returned coroutines in an ``asyncio`` task,
214        f.e. with ``asyncio.create_task()`` and subsequent ``asyncio.gather()``.
215
216        Args:
217            query: the query to run on this API instance
218            raise_on_failure: if ``True``, raises ``query.error`` if the query failed
219
220        Raises:
221            ClientError: when query or status requests fail. If the query was retried, the error
222                         of the last try will be raised. The same exception is also captured in
223                         ``query.error``. Raising can be prevented by setting ``raise_on_failure``
224                         to ``False``.
225            RunnerError: when a call to the query runner raises. This exception is raised
226                         even if ``raise_on_failure` is ``False``, since it is likely an error
227                         that is not just specific to this query.
228        """
229        if query.done:
230            return  # nothing to do
231
232        if query.nb_tries > 0:
233            query.reset()  # reset failed queries
234
235        # query runner is invoked before every try, and once after the last try
236        while True:
237            await self._invoke_runner(query, raise_on_failure=raise_on_failure)
238            if query.done:
239                return
240            await self._try_query_once(query)
241
242    async def _invoke_runner(self, query: Query, *, raise_on_failure: bool) -> None:
243        """
244        Invoke the query runner.
245
246        Raises:
247            ClientError: if the runner raises ``query.error``
248            ValueError: if the runner raises a different ``ClientError`` than ``query.error``
249            RunnerError: if the runner raises any other exception (which it shouldn't)
250        """
251        try:
252            await self._runner(query)
253        except ClientError as err:
254            if err is not query.error:
255                msg = "query runner raised a ClientError other than 'query.error'"
256                raise ValueError(msg) from err
257            if raise_on_failure:
258                raise
259        except AssertionError:
260            raise
261        except BaseException as err:
262            raise RunnerError(cause=err) from err
263
264    async def _try_query_once(self, query: Query) -> None:
265        """A single iteration of running a query."""
266        query_mut = query._mutator()
267        query_mut.begin_try()
268
269        try:
270            await self._cooldown(query)
271
272            req_timeout = _next_query_req_timeout(query)
273
274            if req_timeout.total and req_timeout.total <= 0.0:
275                assert query.run_duration_secs
276                raise GiveupError(kwargs=query.kwargs, after_secs=query.run_duration_secs)
277
278            query_mut.begin_request()
279
280            query.logger.info(f"call api for {query}")
281
282            async with (
283                _map_request_error(req_timeout),
284                self._session().post(
285                    url=urljoin(self._url, "interpreter"),
286                    data=query._code(),
287                    timeout=req_timeout,
288                ) as response,
289            ):
290                query_mut.succeed_try(
291                    response=await _result_or_raise(response, query.kwargs, query.logger),
292                    response_bytes=response.content.total_bytes,
293                )
294
295        except CallTimeoutError as err:
296            fail_with: ClientError = err
297            if query.run_timeout_elapsed:
298                assert query.run_duration_secs is not None
299                fail_with = GiveupError(kwargs=query.kwargs, after_secs=query.run_duration_secs)
300            query_mut.fail_try(fail_with)
301
302        except ClientError as err:
303            query_mut.fail_try(err)
304
305        finally:
306            query_mut.end_try()
307
308    async def _cooldown(self, query: Query) -> None:
309        """
310        If the given query failed with ``TOO_MANY_QUERIES``, check for a cooldown period.
311
312        Raises:
313            ClientError: if the status request to find out the cooldown period fails
314            GiveupError: if the cooldown is longer than the remaining run duration
315        """
316        logger = query.logger
317
318        if not is_too_many_queries(query.error):
319            return
320
321        # If this client is running too many queries, we can check the status for a
322        # cooldown period. This request failing is a bit of an edge case.
323        # 'query.error' will be overwritten, which means we will not check for a
324        # cooldown in the next iteration.
325        status = await self._status(timeout=self._next_status_req_timeout(query))
326
327        if not status.cooldown_secs:
328            return
329
330        run_duration = query.run_duration_secs
331        assert run_duration
332
333        if run_timeout := query.run_timeout_secs:
334            remaining = run_timeout - run_duration
335
336            if status.cooldown_secs > remaining:
337                logger.error(f"give up on {query} due to {status.cooldown_secs:.1f}s cooldown")
338                raise GiveupError(kwargs=query.kwargs, after_secs=run_duration)
339
340        logger.info(f"{query} has cooldown for {status.cooldown_secs:.1f}s")
341        await asyncio.sleep(status.cooldown_secs)
342
343    def _next_status_req_timeout(self, query: Query) -> aiohttp.ClientTimeout:
344        """Status request timeout; possibly limited by either the run or status timeout settings."""
345        remaining = None
346
347        run_duration = query.run_duration_secs
348        assert run_duration
349
350        if run_timeout := query.run_timeout_secs:
351            remaining = run_timeout - run_duration
352
353            if remaining <= 0.0:
354                raise GiveupError(kwargs=query.kwargs, after_secs=run_duration)
355
356            if self._status_timeout_secs:
357                remaining = min(remaining, self._status_timeout_secs)  # cap timeout if configured
358
359        return aiohttp.ClientTimeout(total=remaining)

A client for the Overpass API.

Requests are rate-limited according to the configured number of slots per IP for the specified API server. By default, queries are retried whenever the server is too busy, or the rate limit was exceeded. Custom query runners can be used to implement your own retry strategy.

Arguments:

url: The url of an Overpass API instance. Defaults to the main Overpass API instance.
user_agent: A string used for the User-Agent header. It is good practice to provide a string that identifies your application, and includes a way to contact you (f.e. an e-mail, or a link to a repository). This is important if you make too many requests, or queries that require a lot of resources.
concurrency: The maximum number of simultaneous connections. In practice the amount of concurrent queries may be limited by the number of slots it provides for each IP.
status_timeout_secs: If set, status requests to the Overpass API will time out after this duration in seconds. Defaults to no timeout.
runner: You can provide another query runner if you want to implement your own retry strategy.

References:

https://wiki.openstreetmap.org/wiki/Overpass_API#Public_Overpass_API_instances

Client( url: str = 'https://overpass-api.de/api/', user_agent: str = 'aio-overpass/0.13.1 (https://github.com/timwie/aio-overpass)', concurrency: int = 32, status_timeout_secs: float | None = None, runner: aio_overpass.query.QueryRunner | None = None) View Source

114    def __init__(
115        self,
116        url: str = DEFAULT_INSTANCE,
117        user_agent: str = DEFAULT_USER_AGENT,
118        concurrency: int = 32,
119        status_timeout_secs: float | None = None,
120        runner: QueryRunner | None = None,
121    ) -> None:
122        if concurrency <= 0:
123            msg = "'concurrency' must be > 0"
124            raise ValueError(msg)
125        if status_timeout_secs is not None and status_timeout_secs <= 0.0:
126            msg = "'status_timeout_secs' must be > 0"
127            raise ValueError(msg)
128
129        self._url = url
130        self._user_agent = user_agent
131        self._concurrency = concurrency
132        self._status_timeout_secs = status_timeout_secs
133        self._runner = runner or DefaultQueryRunner()
134
135        self._maybe_session: aiohttp.ClientSession | None = None

async def close(self) -> None: View Source

146    async def close(self) -> None:
147        """Cancel all running queries and close the underlying session."""
148        if self._maybe_session and not self._maybe_session.closed:
149            # do not care if this fails
150            with suppress(CallError):
151                _ = await self.cancel_queries()
152
153            # is raised when there are still active queries. that's ok
154            with suppress(aiohttp.ServerDisconnectedError):
155                await self._maybe_session.close()

Cancel all running queries and close the underlying session.

async def status(self) -> aio_overpass.client.Status: View Source

166    async def status(self) -> Status:
167        """
168        Check the current API status.
169
170        The timeout of this request is configured with the ``status_timeout_secs`` argument.
171
172        Raises:
173            ClientError: if the status could not be looked up
174        """
175        return await self._status()

Check the current API status.

The timeout of this request is configured with the status_timeout_secs argument.

Raises:

ClientError: if the status could not be looked up

async def cancel_queries(self, timeout_secs: float | None = None) -> int: View Source

177    async def cancel_queries(self, timeout_secs: float | None = None) -> int:
178        """
179        Cancel all running queries.
180
181        This can be used to terminate runaway queries that prevent you from sending new ones.
182
183        Returns:
184            the number of terminated queries
185
186        Raises:
187            ClientError: if the request to cancel queries failed
188        """
189        timeout = aiohttp.ClientTimeout(total=timeout_secs) if timeout_secs else None
190        headers = {"User-Agent": self._user_agent}
191        endpoint = urljoin(self._url, "kill_my_queries")
192
193        # use a new session here to get around our concurrency limit
194        async with (
195            aiohttp.ClientSession(headers=headers) as session,
196            _map_request_error(timeout),
197            session.get(endpoint, timeout=timeout) as response,
198        ):
199            body = await response.text()
200            killed_pids = re.findall("\\(pid (\\d+)\\)", body)
201            return len(set(killed_pids))

Cancel all running queries.

This can be used to terminate runaway queries that prevent you from sending new ones.

Returns:

the number of terminated queries

Raises:

ClientError: if the request to cancel queries failed

async def run_query( self, query: Query, *, raise_on_failure: bool = True) -> None: View Source

203    async def run_query(self, query: Query, *, raise_on_failure: bool = True) -> None:
204        """
205        Send a query to the API, and await its completion.
206
207        "Running" the query entails acquiring a connection from the pool, the query requests
208        themselves (which may be retried), status requests when the server is busy,
209        and cooldown periods.
210
211        The query runner is invoked before every try, and once after the last try.
212
213        To run multiple queries concurrently, wrap the returned coroutines in an ``asyncio`` task,
214        f.e. with ``asyncio.create_task()`` and subsequent ``asyncio.gather()``.
215
216        Args:
217            query: the query to run on this API instance
218            raise_on_failure: if ``True``, raises ``query.error`` if the query failed
219
220        Raises:
221            ClientError: when query or status requests fail. If the query was retried, the error
222                         of the last try will be raised. The same exception is also captured in
223                         ``query.error``. Raising can be prevented by setting ``raise_on_failure``
224                         to ``False``.
225            RunnerError: when a call to the query runner raises. This exception is raised
226                         even if ``raise_on_failure` is ``False``, since it is likely an error
227                         that is not just specific to this query.
228        """
229        if query.done:
230            return  # nothing to do
231
232        if query.nb_tries > 0:
233            query.reset()  # reset failed queries
234
235        # query runner is invoked before every try, and once after the last try
236        while True:
237            await self._invoke_runner(query, raise_on_failure=raise_on_failure)
238            if query.done:
239                return
240            await self._try_query_once(query)

Send a query to the API, and await its completion.

"Running" the query entails acquiring a connection from the pool, the query requests themselves (which may be retried), status requests when the server is busy, and cooldown periods.

The query runner is invoked before every try, and once after the last try.

To run multiple queries concurrently, wrap the returned coroutines in an asyncio task, f.e. with asyncio.create_task() and subsequent asyncio.gather().

Arguments:

query: the query to run on this API instance
raise_on_failure: if True, raises query.error if the query failed

Raises:

ClientError: when query or status requests fail. If the query was retried, the error of the last try will be raised. The same exception is also captured in query.error. Raising can be prevented by setting raise_on_failure to False.
RunnerError: when a call to the query runner raises. This exception is raised even if raise_on_failure` isFalse``, since it is likely an error that is not just specific to this query.

class ClientError(builtins.Exception): View Source

59class ClientError(Exception):
60    """Base exception for failed Overpass API requests and queries."""
61
62    @property
63    def should_retry(self) -> bool:
64        """Returns ``True`` if it's worth retrying when encountering this error."""
65        return False

Base exception for failed Overpass API requests and queries.

should_retry: bool View Source

62    @property
63    def should_retry(self) -> bool:
64        """Returns ``True`` if it's worth retrying when encountering this error."""
65        return False

Returns True if it's worth retrying when encountering this error.

Inherited Members

builtins.Exception: Exception
builtins.BaseException: with_traceback; args