Vector operations with numpy
Contents
Vector operations with numpy#
Goals of this lecture#
Today we’re going to discuss using the numpy package in Python. numpy can be used for efficient vector operations, which is very useful for statistics and data analysis––and more broadly, computational social science.
Broadly, this will involve:
- What kinds of tools are involved in computational social science? How is this similar and different to what we’ve discussed already? 
- An introduction to - numpyspecifically.
- Working with vectors. 
What is numpy?#
numpyis a package for scientific computing; specifically, it enables fast computation with vectors and matrices, along with a number of important mathematical operations.
Because numpy is a package, it must be imported.
# Import statement
import numpy as np
What can I use numpy for?#
- numpyallows you to work with homogenous arrays.- A homogenous array is an array with objects all of the same - type.
- E.g., all - int, or all- bool, etc.
 
- The benefit of this is that you can do computations very efficiently. - No more need to loop! 
 
- Enables more advanced mathematical operations. 
Note: numpy is a key part of many advanced machine learning packages!
Creating a numpy.ndarray#
The basic data type of numpy is an ndarray.
- ndarray= N-dimensional array.
A simple way to create an ndarray is np.arange (“a range”).
# Works similar to range(N)
np.arange(1, 4)
array([1, 2, 3])
np.arange in detail#
- By default, - np.arange(start, stop)returns an array of integers from- startto- stop.
- The - stepparameter allows you to determine the granularity of how you “step” between- startand- stop.
## step size = 2
np.arange(1, 4, step = 2)
array([1, 3])
## step size = .5
np.arange(1, 4, step = .5)
array([1. , 1.5, 2. , 2.5, 3. , 3.5])
Check-in#
How would you create an array ranging from 1 to 20, incrementing with a step size of .5? How long would this array be?
### Your code here
Solution#
np_range = np.arange(1, 20, step = .5)
len(np_range)
38
Turning a list into a ndarray#
Another way to create an ndarray is to pass a list into the np.array(...) function.
og_list = [1, 2, 3]
type(og_list)
list
np_array = np.array(og_list)
print(type(np_array))
<class 'numpy.ndarray'>
np_array
array([1, 2, 3])
Check-in#
How would you create a numpy array with the elements [5, 6, 7]?
### Your code here
Solution#
np_array = np.array([5, 6, 7])
np_array
array([5, 6, 7])
Check-in#
Why is this code throwing an error?
test_array = np.array(1, 2, 3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 test_array = np.array(1, 2, 3)
TypeError: array() takes from 1 to 2 positional arguments but 3 were given
Solution#
Make sure you wrap the input array in [].
test_array = np.array([1, 2, 3])
test_array
array([1, 2, 3])
Indexing into a one-dimensional array#
Indexing works just like it does for lists.
np_array[0]
5
np_array[1]
6
np_array[2]
7
Multi-dimensional arrays#
- So far, we’ve just been looking at 1-dimensional arrays. 
- But - numpyis excellent at storing multi-dimensional arrays.
Checking attributes of an array#
The shape attribute tells you the dimensions of an array.
Check-in#
What is the dimensionality of md_array?
### What is dimensionality
md_array
array([[1, 2],
       [3, 4]])
Solution#
You can check this using md_array.shape.
md_array.shape
(2, 2)
Check-in#
What about md_array2?
## 2x3 array
md_array2 = np.array([[1, 2, 3], [4, 5, 6]])
md_array2.shape
(2, 3)
Solution#
You can check this using md_array.shape.
md_array2.shape
(2, 3)
Checking attributes of an array (pt. 2)#
The dtype attribute tells you the type of data in the array.
md_array2.dtype
dtype('int64')
Homogenous data#
As noted earlier, an array is meant to store homogenous elements.
- This means that - np.arraywill try to convert any heterogenous elements to a common- type.
## Note what happens to 5 and 7!
arr3 = np.array(["a", 5, 7])
arr3
array(['a', '5', '7'], dtype='<U21')
arr3.dtype
dtype('<U21')
Interim summary#
- numpyis a package that forms the foundation of scientific computing.
- numpyarrays are the cornerstone of- numpy.
- A - numpyarray is like a- list, with a couple differences:- Requires homogenous elements. 
- Better at representing multi-dimensional arrays. 
- Can be used for vector operations (coming up!). 
 
Working with vectors (intro)#
- numpyvectors make it easier to do all sorts of operations, such as arithmetic operations.
- No more need to use - forloops––can do vector arithmetic the same way we multiply individual numbers.
The old way: arithmetic with for loops and lists#
Adding one list to another requires using a for loop.
list1 = [1, 2, 3]
list2 = [2, 3, 4]
## The "+" operator just combines them
list1 + list2
[1, 2, 3, 2, 3, 4]
## To add them, we must use a for loop
sum_list = []
for index, item in enumerate(list1):
    sum_list.append(item + list2[index])
sum_list
[3, 5, 7]
The new way: arithmetic with numpy#
numpy makes it much easier to do arithmetic operations with vectors.
## First, define some vectors
arr1 = np.array([list1])
arr2 = np.array([list2])
## Can just use "+"!
arr1 + arr2
array([[3, 5, 7]])
Other arithmetic operations#
arr1 -  arr2
array([[-1, -1, -1]])
arr1 *  arr2
array([[ 2,  6, 12]])
arr1 / arr2
array([[0.5       , 0.66666667, 0.75      ]])
Vectors vs. scalars#
- A vector is a list of numbers; a scalar is a single number. 
- We can multiply (or add, subtract, etc.) an entire vector by a single number. 
arr1
array([[1, 2, 3]])
## Multiply all elements by 100
arr1 * 100
array([[100, 200, 300]])
Check-in#
What would multiplying the two arrays below return?
a = np.array([2, 4, 5])
b = np.array([2, 2, 3])
a * b
### Your code here
Solution#
a = np.array([2, 4, 5])
b = np.array([2, 2, 3])
a * b
array([ 4,  8, 15])
Check-in#
What would happen if we ran the code below?
a = np.array([2, 4, 5])
b = np.array([2, 2])
a * b
### Your code here
Solution#
Vectors/matrices must have compatible shapes.
a = np.array([2, 4, 5])
b = np.array([2, 2])
a * b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [130], in <cell line: 3>()
      1 a = np.array([2, 4, 5])
      2 b = np.array([2, 2])
----> 3 a * b
ValueError: operands could not be broadcast together with shapes (3,) (2,) 
Thinking with vectors#
- Using - numpyfor vector arithmetic can sometimes involve a “cognitive shift”.
- We’re used to multiplying individual elements; now we have to transition to thinking about multiplying entire vectors. 
- But it’s much more efficient! 
Conclusion#
This was a brief introduction to numpy.
- numpyis a powerful package used in scientific computing.
- The cornerstone of - numpyis- numpy.ndarray.
- numpyarrays can be used for efficient vector computations.
Next time, we’ll dive deeper into more advanced numpy operations.
