Exploring Data Analysis Tools: NumPy and pandas as Core Libraries

It’s been a while I last communicated on here, and I believe I owe you an explanation of what happened during the brief hiatus. Okay, so I was away from blogging for a bit because I had to focus on a project titled “Predicting electricity access in Sub-Saharan Africa using Machine learning models”. This Project stressed the hell out of me, especially because I am still new to this and I had to do tons of research, but I’m back and eager to write again.

From the topic above, I believe you already know what I want to write on. If you’re learning data science or machine learning, two libraries you’ll use constantly are NumPy and Pandas. These tools make it easier to handle large datasets, perform numerical operations, and explore your data efficiently, all within Python.

In this post, we’ll walk through practical exercises that cover essential NumPy and Pandas operations. Each example is short, hands-on, and beginner-friendly, perfect for reinforcing your core skills.

What is numpy

NumPy short for "Numerical Python," is a powerful, well-optimized, free open-source library for the Python programming language, adding support for large, multi-dimensional arrays (also called matrices or tensors) and used in scientific computing and data analysis.

Now let’s look at a few examples

Example 1

Write a Numpy program to test element-wise for NaN of a given array

import numpy as np

# Sample array
arr = np.array([1, np.nan, 3, np.nan, 5])

# Test element-wise for NaN
result = np.isnan(arr)
print("NaN check:\n", result)

Example 2

Write a NumPy program to test whether two arrays are element-wise equal within a tolerance.

# Sample arrays
a = np.array([1.0, 1.5, 3.0])
b = np.array([1.0, 1.51, 2.99])

# Element-wise equal within tolerance
equal_within_tol = np.allclose(a, b, atol=0.02)
print("Equal within tolerance:", equal_within_tol)

I also got to work on a dataset WHO POP TB all.csv where I made use of the numpy library for analysis, I will add an example below so you can get a better context of what I am saying

Example 3

In the code cell below, select and display the first eight rows from the <code>'Country'</code> and <code>'TB deaths'</code> columns.

df.loc[:7, ['Country', 'TB deaths']]

What is Pandas

Pandas is open-source Python library which is used for data manipulation and analysis. It consists of data structures and functions to perform efficient operations on data.

Here are a few examples

Example 4

Write a Python program to add, subtract, multiple and divide two Pandas Series. Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]

import pandas as pd

a = pd.Series([2, 4, 6, 8, 10])
b = pd.Series([1, 3, 5, 7, 9])

print("Addition:\n", a + b)
print("Subtraction:\n", a - b)
print("Multiplication:\n", a * b)
print("Division:\n", round(a / b,2))

Example 5

Write a Python program to convert a NumPy array to a Pandas series. Sample Series: NumPy array:[10 20 30 40 50]Converted Pandas series:0 10 1 20 2 30 3 40 4 50dtype: int64

import numpy as np
import pandas as pd

arr = np.array([10, 20, 30, 40, 50])
s = pd.Series(arr)

print("Converted Pandas Series:")
print(s)

Example 6

Write a Pandas program to add some data to an existing Series.Sample Output:Original Data Series:0 100 1 200 2 python 3 300.12 4 400 dtype: objectData Series after adding some data:0 100 1 200 2 python 3 300.12 4 400 0 500 1 php dtype: object

import pandas as pd

s1 = pd.Series([100, 200, 'python', 300.12, 400])
print("Original Data Series:")
print(s1)

s2 = pd.Series([500, 'php'])

# concat
combined = pd.concat([s1, s2], ignore_index=True)

print("\nData Series after adding some data:")
print(combined)

The Project: Climate Exploratory data analysis across 6 Cities

Problem Statement

The goal of this project is to analyse climate data from 6 different cities(Beijing, Delhi, Moscow, London, Capetown and brasilia) to understand variations in weather patterns such as temperature, rainfall, and humidity

Data Methodology

Step 1: Load and inspect the CSV file (Comma-Separated Values file) from your computer into Python as a DataFrame. I attached all the links to the data frames, so you can follow along (Beijing_PEK_2014.csv, Brasilia_BSB_2014.csv, CapeTown_CPT_2014.csv, Delhi_DEL_2014.csv, London_2014.csv, Moscow_SVO_2014.csv)

df1 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\Beijing_PEK_2014.csv")
df2 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\Brasilia_BSB_2014.csv")
df3 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\CapeTown_CPT_2014.csv")
df4 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\Delhi_DEL_2014.csv")
df5 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\London_2014.csv")
df6 = pd.read_csv(r"c:\Users\PC\dataraflow-cohort-1\Week-6 (Numpy & Pandas-1)\3-Data Analysis-Pandas-1\Moscow_SVO_2014.csv")

df1.head() was used to inspect the df1 dataframe

Step 2: Data Cleaning and Exploration to count the number of null values and duplicates

df1.isnull().sum() and df1.duplicated().sum()

Step 3: Generate Summary Statistics of the Data frame

df1.describe()

Step 4 : Analysis

# Convert 'Date' to datetime (force conversion, skip bad rows)
df1['Date'] = pd.to_datetime(df1['Date'], errors='coerce')

# Now extract the month name
df1['Month'] = df1['Date'].dt.month_name()

# Group by month and compute average temperature
monthly_avg_temp = df1.groupby('Month')['Mean TemperatureC'].mean().sort_values(ascending=False)

print("Average Temperature per Month:")
print(monthly_avg_temp)

Here are my Key Findings

Beijing has the highest maximum temperature of 42 degrees, which occurred in May and also the highest mean temperature of 31 degrees

Moscow has the lowest minimum temperature of -26 degrees and also the lowest mean temperature of -21 degrees

From my analysis, we can see the month of July seemed to top in the mean temperature across all countries except in Brasilia and Cape Town

Beijing also recorded the highest Precipitation compared to other Countries of about 75.95mmHg

There was no record of rainfall in Moscow

London has the highest mean humidity of 96%, followed by Delhi, which is 94% and Beijing has the least humidity of 8%

Challenges

One of the biggest hurdles I faced was right at the start: loading and combining all the data files into my notebook, just so I don’t have an individual data frame for each. I tried all I could but ended up having individual data frames, and the solution I later got would have me move it all to Google Colab and tbh, time wasn’t on my side.

The next challenge I faced was getting myself accustomed to each term in the dataset(I mean the column headers) to better understand them, but this wasn’t much of a challenge, though.

Another one is missing values, especially in the Max Gust, Cloud cover and Events columns because if not properly taken care of could distort the analysis and minimize data quality.

Conclusion

This analysis not only sharpened my technical skills but also deepened my appreciation of how data can tell powerful environmental stories, and to be honest, there was still so much more to uncover, but having a clear scope helped narrow down my analysis to what I should focus on

I’m looking forward to discovering what the next set of tasks will bring and how they’ll help me grow further.

Connect with me on :

Github

Exploring Data Analysis Tools: NumPy and pandas as Core Libraries

What is numpy

Example 1

Example 2

Example 3

What is Pandas

Example 4

Example 5

Example 6

The Project: Climate Exploratory data analysis across 6 Cities

Here are my Key Findings

Challenges

Conclusion

Comments

More from this blog

From Regression Basics to Churn Prediction: A Data Storytelling Journey

🚗 From Guesswork to Goldmine: How Machine Learning Predicts Car Prices With 93.5% Accuracy

Data Science in Real Estate: Building an Interpretable Machine Learning Pipeline for Housing Price Prediction

From Global Energy to Lagos Real Estate: A Data Scraping & Analysis Journey.

From Concrete to Code: A Civil Engineer's Data-Driven Look at Nigeria's Construction Trade

Command Palette

What is numpy

Example 1

Example 2

Example 3

What is Pandas

Example 4

Example 5

Example 6

The Project: Climate Exploratory data analysis across 6 Cities

Here are my Key Findings

Challenges

Conclusion

Comments

More from this blog