Data Science with Python, Part 1: Introduction to Basics
This is the first post for Data Science with Python series.
Python 3 is used for all articles related to Python in my blog. Blog posts for this series are created using Jupyter Notebook.
Introduction ¶
Just like R, you can use Python as a calculator. But Python is versatile programming language that it can do so much more!
Assigning Value to a Variable ¶
In R, we usually use this sign to assign a value to a variable <-
, but in Python, it is simply this =
.
# Assign 20 to x
x = 20
# Print x
x
Python Data Types ¶
Let's learn about the data types in Python.
- Integer - for example 20, it has no fractional part
- Float - for example 20.50, it has an integer and fractional parts
- String - for example 'my dogs', they are texts and need to be wrapped in a single or double quotation marks (also known as character in R)
- Boolean -
True
orFalse
(in R, these are spelled with all capital letters) - List
To see the data types you can simply use this function type()
a = 20.50
b = 'my dogs'
c = True
# Check data type of x
type(x)
# Check data type of a
type(a)
# Check data type of b
type(b)
# Check data type of c
type(c)
Data Conversion ¶
In Python, you can do data conversion easily by using these functions: int()
, float()
, str()
and bool()
.
For example, when you want to print a message containing data of different types.
'A trip to the vet with my dog will cost me ' + 20.50
I can easily fix this error by converting the float into string.
'A trip to the vet with my dog will cost me €' + str(20.50)
List ¶
List is another data type in Python. Just like R, list can store different types of data. A list can also contain lists.
# Create a list
doggies = ['Jodi', 'Loki', 'Ru', 'Pip', 'Bear']
# Create sublists in list
doggies = [['Jodi', 3], ['Loki', 2], ['Ru', 1], ['Pip', 2], ['Bear', 4]]
Subsetting List ¶
Different from R, in Python, the index starts at 0 and not 1.
To access the element in the list, you use []
. In our doggies list, we have 5 sublists.
# Access first sublist
doggies[0]
# Access second element in the first sublist
doggies[0][1]
Slicing List ¶
Let's say we want to get the first, second and third sublists. We can slice the list like this list[start:end]
.
Important: In Python, the index specified for end is not inclusive.
# Get first to third sublist
doggies[0:3]
Notice that index 3 refers to ['Pip', 2]
, but because the end
index is not inclusive, it will not be included in the result.
Changing Elements in List ¶
We noticed that we made an error with Pip's age, she's not 2 but she's 3 and she has an extra p in her name! Let's changes this in our list.
# Change Pip's age
doggies[3] = ['Pipp', 3]
# View updated doggies list
doggies
And we also noticed Bear is 4, when he is only a pupper!
# Change Bear's age
doggies[4][1] = 1
# View updated doggies list
doggies
Adding and Deleting Elements in List ¶
We can easily add element to the list by using +
sign.
# Add another doggo into the list as a sublist (double bracket)
doggies = doggies + [['Fudge', 2]]
# View updated doggies list
doggies
Note that as we are adding another sublist, we need to use double brackets instead of single.
If we want to remove one of the sublist, we do this using del()
.
# Remove Pipp
del(doggies[3])
# View updated doggies list
doggies
Copying a List ¶
To copy a list (its elements), we can do it in two ways.
# Using list()
doggies_copy1 = list(doggies)
# Using :
doggies_copy2 = doggies[:]
Important: Do not copy a list like this doggies_copy1 = doggies
as this is only copying the reference of the location of the list in your machine.
Functions and Methods ¶
Difference between Function and Method ¶
Function is a 'black box' of codes which will produce an output. In Python, everything is an object. Like string, integer and Boolean, they are objects. Function is independent of object. It can be called on different types of object in Python.
Unlike function, method is dependent on object. It can only be called on specific object. Different type of object will have different set of methods that can be used on the data passed in the method.
Let's look at a few useful functions and methods for every data type in Python.
Functions ¶
len()
- finding length
# Find length of doggies
len(doggies)
?
- To get help with any function in Python, use ?
before the function name like this ?type
sorted()
- sort data
# Sort data alphabetically
sorted(doggies)
# Create a variable containing string
wine = 'pinot noir'
# Capitalise letters
wine.upper()
# Count number of n
wine.count('n')
List Methods ¶
index()
- find the index of the first matched element in the listcount()
- find the number of appearance of an element in a listappend()
- adds an element to a listremove()
- delete the first element that matches the inputreverse()
- reverse the order of elements in the list
# Create a list containing strings
wine_list = ['pinot noir', 'cabernet sauvignon', 'prosecco', 'malbec', 'cabernet sauvignon']
# Find the index of the first cabernet sauvignon
wine_list.index('cabernet sauvignon')
# Count number cabernet sauvignon in the list
wine_list.count('cabernet sauvignon')
# Add sauvignon blanc
wine_list.append('sauvignon blanc')
# View updated wine_list
wine_list
# Delete first matched element: cabernet sauvignon
wine_list.remove('cabernet sauvignon')
# View updated wine_list
wine_list
# Reverse order
wine_list.reverse()
# View updated wine_list
wine_list
Importing Packages ¶
Python has a lot of useful packages that can help you with your data analysis. For example, the NumPy and pandas. To be able to use these packages, you will need to import them.
Three things to note when importing packages:
- You can import the whole package or import parts of the package
- You can abbreviate the name of the packages
# Import whole packages and abbreviate
import numpy as np
import pandas as pd
# Import part of a package
from matplotlib import style
In the second part of Data Science with Python series, we will learn two packages in detail, NumPy and Pandas.