{ "cells": [ { "cell_type": "markdown", "id": "357a45b3", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Other useful Python packages" ] }, { "cell_type": "markdown", "id": "9fdaec8e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Announcements\n", "\n", "- CAPES and survey available!\n", " - Survey: https://forms.gle/xox1KgV6FwCYoESX6\n", " - CAPES: https://cape.ucsd.edu/student/instructions.html\n" ] }, { "cell_type": "markdown", "id": "08a31438", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Goals of this lecture\n", "\n", "- Review of course: what we've learned this quarter.\n", "- Overview of other useful Python **packages**:\n", " - `seaborn`: easily and quickly make **data visualizations**. \n", " - `scipy`: tools for **statistical analyses**. \n", " - `nltk`: tools for **Natural Language Processing**, like *sentiment analysis*." ] }, { "cell_type": "markdown", "id": "979d6a04", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What we've learned\n", "\n", "- Reflect on the very first day of class: most of you had never programmed before!\n", "- Now you know:\n", " - How to use **Jupyter notebooks**. \n", " - How to write `if` statements and `for` loops.\n", " - How to create **custom functions**. \n", " - How to **read in files** of various types. \n", " - How to work with **tabular data** using `pandas`.\n", "\n", "That's a lot for ten weeks!" ] }, { "cell_type": "markdown", "id": "c58b6107", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Data visualization with `seaborn`" ] }, { "cell_type": "markdown", "id": "372cb677", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### What is data visualization?\n", "\n", "[Data visualization](https://en.wikipedia.org/wiki/Data_visualization) refers to the process (and result) of representing data graphically.\n", "\n", "- **CSS 2** will dedicate much more time to this. \n", "- Today: introduction to `seaborn`." ] }, { "cell_type": "code", "execution_count": 3, "id": "09c4961f", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt # conventionalized abbreviation\n", "import pandas as pd\n", "import seaborn as sns\n", "%matplotlib inline \n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "id": "996c7dfc", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example dataset" ] }, { "cell_type": "code", "execution_count": 4, "id": "b82526bf", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "6433" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Load taxis dataset\n", "df_taxis = sns.load_dataset(\"taxis\")\n", "len(df_taxis)" ] }, { "cell_type": "code", "execution_count": 6, "id": "73ca5014", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | pickup | \n", "dropoff | \n", "passengers | \n", "distance | \n", "fare | \n", "tip | \n", "tolls | \n", "total | \n", "color | \n", "payment | \n", "pickup_zone | \n", "dropoff_zone | \n", "pickup_borough | \n", "dropoff_borough | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2019-03-23 20:21:09 | \n", "2019-03-23 20:27:24 | \n", "1 | \n", "1.60 | \n", "7.0 | \n", "2.15 | \n", "0.0 | \n", "12.95 | \n", "yellow | \n", "credit card | \n", "Lenox Hill West | \n", "UN/Turtle Bay South | \n", "Manhattan | \n", "Manhattan | \n", "
1 | \n", "2019-03-04 16:11:55 | \n", "2019-03-04 16:19:00 | \n", "1 | \n", "0.79 | \n", "5.0 | \n", "0.00 | \n", "0.0 | \n", "9.30 | \n", "yellow | \n", "cash | \n", "Upper West Side South | \n", "Upper West Side South | \n", "Manhattan | \n", "Manhattan | \n", "