Data retrieval from WEB API with Python

news
Author

Jumbong Junior

Published

April 24, 2024

Introduction

In the current digital era, access and exploitation of data have become essential for many organizations. Web APIs( Application Programming Interface) offer a standardized interface to access data from the web. Thanks to its robusts libraries, Python greatly simplifies the process of retrieving data from the web. In this article, we will explore how to retrieve data from a web API using Python.

What is a Web API?

A web API is a method for requesting and sending data between a client and a server. The client can be a web browser, a mobile application, or any other device that can access the internet. The server is a computer that hosts the data and processes the requests. There are many types of web APIs for accessing different types of data : - Spotify API for music data - Twitter API for social media data - Google Maps API for location data - CDS API for climate data

To access data from a web API, using Python, we need to follow these steps: 1. Find the API documentation 2. Install the necessary libraries 3. Make a HTTP request to the API to retrieve the data 4. Transform the data into a python object easily manipulated.

The most API web provide data in JSON format. JSON format is easily readable and manipulated in Python because it is similar to a Python dictionary. Let’s see how the HTTP request works in general and how to retrieve data from a web API using Python.

HTTP Request in Python

HTTP Request format

First, let’s study the format of an HTTP request, such as the dozens you make every day through your web browser. When you enter the following, URL in your browser’s address bar :

https://jumbong.github.io/ensai/posts/ClimateScenario/substainablefinance.html/

Your browser will send a request to the server concerned(these request will not only content the target URL, but also other information that will not dwell on here). In the previous URL we can distinguish 3 sub-parts :

  • https://: Indicates the protocol to use to make the request(in this case https). In this chapter, we will use only be interested in the HTTP and HTTPS protocols(the secure version of the HTTP protocol).
  • jumbong.github.io: The domain name of the server to which the request is addressed.
  • /ensai/posts/ClimateScenario/substainablefinance.html/: The path to the resource on the server.

Similarly, when calling a web API, we will specify the protocole to use, the machine to contact, the path to the desired resource and a number of parameters that will describe our request. Here is an example of a request to a web API(the Google Maps directions API in this case):

https://maps.googleapis.com/maps/api/directions/json?origin=Toronto&destination=Montreal

You can copy/paste this URL into your browser’s address bar and observe what you get in return. Note that the result of this request is in JSON format. In fact, if you study the URL more closely, you’ll see that we’ve asked to get the result in this format. Additionally, we’ve specified in the URL that we want to get the route information from Toronto (origin parameter) to Montreal (destination parameter).

You should also notice that, in response to this request, the Google Maps API actually returns an error message. Indeed, to be authorized to use this API, you need to have an API key and provide this key in the form of an additional parameter (named key in the Google Maps APIs for example). So the previous request would become:

https://maps.googleapis.com/maps/api/directions/json?origin=Toronto&destination=Montreal&key=VOTRE_CLE

in which you will have to replace YOUR_KEY with a key that you have previously generated and that will allow you to use the web service in an authenticated manner. To create an API key, you need to go to the developer interface of the API concerned (here for the Google Maps Directions API for example).

Retrieving data from a web API using the python requests module

HTTP requests in (very) brief

In the HTTP protocol, there are several types of requests to perform the exchange between the client and the server. In particular, GET requests are widely used when the client requests a resource from the server. This is a request to download a document. It is possible to transmit parameters to filter the response; in this case, the parameters will be transferred “in clear” (in the URL used for the request).

POST requests, like GET, allow downloading a document from the server to the client, but with more sophistication: the parameters are hidden and it is possible to request to update data on the server as part of the request.

There are other HTTP requests that we will not detail here.

The previous section provided a refresher on the format of HTTP requests, and you were asked to perform HTTP requests using your browser. If you now want to automatically retrieve the result of an HTTP request to manipulate it in Python, the most convenient way is to perform the HTTP request from within Python. To do this, we use the requests module. This module includes a get function that allows you to perform GET-type HTTP requests (I’ll let you guess the name of the function that allows you to perform HTTP POST requests :) :

import requests

url = "http://my-json-server.typicode.com/rtavenar/fake_api/tasks"

reponse = requests.get(url)
print(reponse)
<Response [200]>

The get function returns a Response object. This object contains the response to the request, including the status code (200 if the request was successful).

Status codes
  • 20x : the request is successful
    • Example: 200 OK
  • 40x : error due to client
    • Example : 404 page not found
  • 50x : error due to server
    • Example : 504 Gateway Timeout

You can obtain the result of our request in two forms: the raw text of the result, which is stored in response.text, and the formatted version (in the form of a dictionary or list) of this result, which you can obtain via response.json().

contenu_txt = reponse.text
print(type(contenu_txt))
<class 'str'>
contenu = reponse.json()
print(type(contenu))
<class 'list'>
print(contenu)
[{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}, {'userId': 1, 'id': 2, 'title': 'quis ut nam facilis et officia qui', 'completed': False}, {'userId': 1, 'id': 3, 'title': 'fugiat veniam minus', 'completed': False}, {'userId': 1, 'id': 4, 'title': 'et porro tempora', 'completed': True}, {'userId': 1, 'id': 8, 'title': 'quo adipisci enim quam ut ab', 'completed': True}, {'userId': 3, 'id': 44, 'title': 'cum debitis quis accusamus doloremque ipsa natus sapiente omnis', 'completed': True}, {'userId': 3, 'id': 45, 'title': 'velit soluta adipisci molestias reiciendis harum', 'completed': False}, {'userId': 3, 'id': 46, 'title': 'vel voluptatem repellat nihil placeat corporis', 'completed': False}]

Furthermore, if you want to pass parameters to the HTTP request (what was after the ? symbol in the URLs above), you can do so when calling requests.get :

import requests

url = "http://my-json-server.typicode.com/rtavenar/fake_api/tasks"

reponse = requests.get(url, params="userId=3")
contenu = reponse.json()
print(contenu)
[{'userId': 3, 'id': 44, 'title': 'cum debitis quis accusamus doloremque ipsa natus sapiente omnis', 'completed': True}, {'userId': 3, 'id': 45, 'title': 'velit soluta adipisci molestias reiciendis harum', 'completed': False}, {'userId': 3, 'id': 46, 'title': 'vel voluptatem repellat nihil placeat corporis', 'completed': False}]

The code above corresponds to what you would get in your browser by entering the URL http://my-json-server.typicode.com/rtavenar/fake_api/tasks?userId=3.

In practice, in many cases, Python modules exist to allow the use of public APIs without having to manage HTTP requests directly. This is the case, for example, with the tweepy module (for the Twitter API) or the graphh module (which allows access to the GraphHopper API, which is a free alternative to Google Maps).

Conclusion

Thank you for reading. If you have any suggestions don’t hesitate, I will take it in consideration.

References