dogmadogmassage.com

Mastering Web Scraping with Python: JSON Data Handling

Written on

Chapter 1: Introduction to JSON Web Scraping

In this article, we will delve into the organization of JSON response objects obtained through web scraping with Python's requests library, building on my previous articles.

Before diving in, I recommend checking out my earlier pieces, especially the one that explores various scraping techniques. Today, we'll employ a straightforward method by executing a GET request to an API endpoint. Our focus will be on scraping live football matches from PaddyPower, similar to the previous tutorial.

First, we'll initiate a request to the endpoint, as demonstrated below, and parse the response into a JSON format.

import requests

import json

# Replace 'your_endpoint_url' with the actual URL you want to call

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}

# Make a GET request

response = requests.get(endpoint_url, headers=headers)

json_data = response.json()

Next, we will create arrays to store our data:

event_ids = []

match_ids = []

fixtures = []

odds_matches = []

competitions = []

market_types = []

market_names = []

match_dates_and_time = []

time_scraped = []

Now, we move on to the more intricate part of the tutorial: navigating through the JSON data. If you're not well-versed in JSON structure, it is akin to how JavaScript organizes objects. JSON consists of key-value pairs, which, while seemingly straightforward, can become complex due to nested data.

From our analysis, we can see that the "attachments" object contains crucial information such as competition names and IDs. Moving further, the "events" object holds match IDs and fixture names.

Let’s store the event data in a variable called data (a more descriptive name would be better). We can iterate through the event data, which corresponds to the match IDs, and append these to our list.

data = json_data['attachments']['events']

for i in data:

event_ids.append(i)

We will then construct the request URLs to retrieve market and odds data for each match:

end_of_url = '&exchangeLocale=en_GB&includeBettingOpportunities=true&includePrices=true&includeSeoCards=true&includeSeoFooter=true&language=en&loggedIn=false&priceHistory=1®ionCode=UK'

for i in event_ids:

url = start_of_url + i + end_of_url

print(url)

match_response = requests.get(url, headers=headers)

match_data = match_response.json()

Next, we will extract specific data such as the match name (fixture) and the competition name:

match_name = match_data['attachments']['events']

keys_list = list(match_name)

event_data = match_name.get(keys_list[0], {})

event_name = event_data.get('name', 'N/A')

event_time = event_data.get('openDate', 'N/A')

competition_name = match_data['attachments']['competitions']

keys_list = list(competition_name)

competition_data = competition_name.get(keys_list[0], {})

competition_name_new = competition_data.get('name', 'N/A')

competition_id = competition_data.get('competitionId', 'N/A')

We also gather market keys to scrape from the data:

markets = match_data['attachments']['markets'].keys()

for j in markets:

markets_new = match_data['attachments']['markets'][j]['marketName']

selections = match_data['attachments']['markets'][j]['runners']

for k in selections:

fixtures.append(event_name)

competitions.append(competition_name_new)

market_types.append(markets_new)

match_dates_and_time.append(event_time)

match_ids.append(i)

name = k['runnerName']

market_names.append(name)

current_time = datetime.now()

time_scraped.append(current_time)

print(name)

try:

odds = k['winRunnerOdds']['trueOdds']['decimalOdds']['decimalOdds']

except:

odds = 'Issue with Odds'

odds_matches.append(odds)

Finally, we will create a DataFrame to store all the collected data:

columns = ['Match ID', 'Fixture', "Match Time and Date", "Competition", "Market Type", "Market Name", "Market Odds", "Time Scraped"]

# Initialize a new DataFrame with columns

new_dataframe = pd.DataFrame(columns=columns)

# Add arrays to columns

new_dataframe['Match ID'] = match_ids

new_dataframe['Fixture'] = fixtures

new_dataframe['Match Time and Date'] = match_dates_and_time

new_dataframe['Competition'] = competitions

new_dataframe['Market Type'] = market_types

new_dataframe['Market Name'] = market_names

new_dataframe['Market Odds'] = odds_matches

new_dataframe['Time Scraped'] = time_scraped

new_dataframe

The output will look like this.

As always, feel free to reach out with any questions or feedback on this article. If you found it helpful, please give it a clap and follow for more!

Chapter 2: Video Resources

Explore how to scrape JSON data embedded in SCRIPT tags in this tutorial.

Learn to scrape live scores without the need for BeautifulSoup or Selenium.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

The Terrifying Power of Nature: A Haunting Reflection

An exploration of nature's formidable forces and their impact on humanity, illustrated with striking imagery and videos.

Curated Collection — Nostalgia Series: Mike's Top Picks

A delightful selection of engaging stories for every reader.

Mathematics: A Refined Explanation of Basic Additions

Explore the foundational principles behind basic arithmetic operations like 1 + 1 = 2, framed through Peano's axioms.

Living Stones: The Enigmatic Trovants of Romania

Explore the mysterious phenomenon of living stones, known as trovants, found in Romania, and uncover their scientific explanations.

# The Dark Forest Theory: Why We Might Never Find Aliens

Exploring the Dark Forest Theory offers a chilling perspective on why we haven't encountered extraterrestrial life.

Regaining His Interest: Three Key Strategies to Avoid Loss

Discover effective strategies to make a man regret losing you and avoid being blindsided in relationships.

Mastering Monthly Planning: A 30-Day Routine Strategy

Discover how to effectively plan your month in just one hour with this comprehensive guide.

# The Impact of Glaciation on Earth's History: A Deep Dive

Explore the significant effects of glaciation in Earth's history, including the consequences of the Great Oxygenation Event.