Vector operations with numpy
Contents
Vector operations with numpy
#
Goals of this lecture#
Today we’re going to discuss using the numpy
package in Python. numpy
can be used for efficient vector operations, which is very useful for statistics and data analysis––and more broadly, computational social science.
Broadly, this will involve:
What kinds of tools are involved in computational social science? How is this similar and different to what we’ve discussed already?
An introduction to
numpy
specifically.Working with vectors.
What is numpy
?#
numpy
is a package for scientific computing; specifically, it enables fast computation with vectors and matrices, along with a number of important mathematical operations.
Because numpy
is a package, it must be imported.
# Import statement
import numpy as np
What can I use numpy
for?#
numpy
allows you to work with homogenous arrays.A homogenous array is an array with objects all of the same
type
.E.g., all
int
, or allbool
, etc.
The benefit of this is that you can do computations very efficiently.
No more need to loop!
Enables more advanced mathematical operations.
Note: numpy
is a key part of many advanced machine learning packages!
Creating a numpy.ndarray
#
The basic data type of numpy
is an ndarray
.
ndarray
= N-dimensional array.
A simple way to create an ndarray
is np.arange
(“a range”).
# Works similar to range(N)
np.arange(1, 4)
array([1, 2, 3])
np.arange
in detail#
By default,
np.arange(start, stop)
returns an array of integers fromstart
tostop
.The
step
parameter allows you to determine the granularity of how you “step” betweenstart
andstop
.
## step size = 2
np.arange(1, 4, step = 2)
array([1, 3])
## step size = .5
np.arange(1, 4, step = .5)
array([1. , 1.5, 2. , 2.5, 3. , 3.5])
Check-in#
How would you create an array ranging from 1
to 20
, incrementing with a step size of .5
? How long would this array be?
### Your code here
Solution#
np_range = np.arange(1, 20, step = .5)
len(np_range)
38
Turning a list
into a ndarray
#
Another way to create an ndarray
is to pass a list
into the np.array(...)
function.
og_list = [1, 2, 3]
type(og_list)
list
np_array = np.array(og_list)
print(type(np_array))
<class 'numpy.ndarray'>
np_array
array([1, 2, 3])
Check-in#
How would you create a numpy
array with the elements [5, 6, 7]
?
### Your code here
Solution#
np_array = np.array([5, 6, 7])
np_array
array([5, 6, 7])
Check-in#
Why is this code throwing an error?
test_array = np.array(1, 2, 3)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 test_array = np.array(1, 2, 3)
TypeError: array() takes from 1 to 2 positional arguments but 3 were given
Solution#
Make sure you wrap the input array in []
.
test_array = np.array([1, 2, 3])
test_array
array([1, 2, 3])
Indexing into a one-dimensional array#
Indexing works just like it does for list
s.
np_array[0]
5
np_array[1]
6
np_array[2]
7
Multi-dimensional arrays#
So far, we’ve just been looking at 1-dimensional arrays.
But
numpy
is excellent at storing multi-dimensional arrays.
Checking attributes of an array#
The shape
attribute tells you the dimensions of an array.
Check-in#
What is the dimensionality of md_array
?
### What is dimensionality
md_array
array([[1, 2],
[3, 4]])
Solution#
You can check this using md_array.shape
.
md_array.shape
(2, 2)
Check-in#
What about md_array2
?
## 2x3 array
md_array2 = np.array([[1, 2, 3], [4, 5, 6]])
md_array2.shape
(2, 3)
Solution#
You can check this using md_array.shape
.
md_array2.shape
(2, 3)
Checking attributes of an array (pt. 2)#
The dtype
attribute tells you the type of data in the array.
md_array2.dtype
dtype('int64')
Homogenous data#
As noted earlier, an array is meant to store homogenous elements.
This means that
np.array
will try to convert any heterogenous elements to a commontype
.
## Note what happens to 5 and 7!
arr3 = np.array(["a", 5, 7])
arr3
array(['a', '5', '7'], dtype='<U21')
arr3.dtype
dtype('<U21')
Interim summary#
numpy
is a package that forms the foundation of scientific computing.numpy
arrays are the cornerstone ofnumpy
.A
numpy
array is like alist
, with a couple differences:Requires homogenous elements.
Better at representing multi-dimensional arrays.
Can be used for vector operations (coming up!).
Working with vectors (intro)#
numpy
vectors make it easier to do all sorts of operations, such as arithmetic operations.No more need to use
for
loops––can do vector arithmetic the same way we multiply individual numbers.
The old way: arithmetic with for
loops and list
s#
Adding one list
to another requires using a for
loop.
list1 = [1, 2, 3]
list2 = [2, 3, 4]
## The "+" operator just combines them
list1 + list2
[1, 2, 3, 2, 3, 4]
## To add them, we must use a for loop
sum_list = []
for index, item in enumerate(list1):
sum_list.append(item + list2[index])
sum_list
[3, 5, 7]
The new way: arithmetic with numpy
#
numpy
makes it much easier to do arithmetic operations with vectors.
## First, define some vectors
arr1 = np.array([list1])
arr2 = np.array([list2])
## Can just use "+"!
arr1 + arr2
array([[3, 5, 7]])
Other arithmetic operations#
arr1 - arr2
array([[-1, -1, -1]])
arr1 * arr2
array([[ 2, 6, 12]])
arr1 / arr2
array([[0.5 , 0.66666667, 0.75 ]])
Vectors vs. scalars#
A vector is a list of numbers; a scalar is a single number.
We can multiply (or add, subtract, etc.) an entire vector by a single number.
arr1
array([[1, 2, 3]])
## Multiply all elements by 100
arr1 * 100
array([[100, 200, 300]])
Check-in#
What would multiplying the two arrays below return?
a = np.array([2, 4, 5])
b = np.array([2, 2, 3])
a * b
### Your code here
Solution#
a = np.array([2, 4, 5])
b = np.array([2, 2, 3])
a * b
array([ 4, 8, 15])
Check-in#
What would happen if we ran the code below?
a = np.array([2, 4, 5])
b = np.array([2, 2])
a * b
### Your code here
Solution#
Vectors/matrices must have compatible shapes.
a = np.array([2, 4, 5])
b = np.array([2, 2])
a * b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [130], in <cell line: 3>()
1 a = np.array([2, 4, 5])
2 b = np.array([2, 2])
----> 3 a * b
ValueError: operands could not be broadcast together with shapes (3,) (2,)
Thinking with vectors#
Using
numpy
for vector arithmetic can sometimes involve a “cognitive shift”.We’re used to multiplying individual elements; now we have to transition to thinking about multiplying entire vectors.
But it’s much more efficient!
Conclusion#
This was a brief introduction to numpy
.
numpy
is a powerful package used in scientific computing.The cornerstone of
numpy
isnumpy.ndarray
.numpy
arrays can be used for efficient vector computations.
Next time, we’ll dive deeper into more advanced numpy
operations.