Working with Text Files
Contents
Working with Text Files#
Goals of this lecture#
In programming––especially for CSS––it’s quite common that you’ll need to read and write files.
This lecture will cover the basics of that process.
The why: why would we want to read in a file?
The what: what is a file? Where is it stored? What is a filepath?
The how: how do you read in a file? How do you write a file?
This lecture will focus specifically on text files, though there are a number of file types (.json
, .csv
, etc.).
Why read and write files?#
Fundamentally, a file is just a way to store data.
This data could take many forms:
Unstructured text.
JSON, i.e., a kind of
dict
..csv
, i.e., like an Excel file.An executable file, like a Python script (
.py
).
Computational Social Science centers around working with data. Thus, it’s important to understand how to read and write these files.
Some common use cases#
In CSS research, reading and writing files is pretty much unavoidable. It happens almost anytime you want to work with data.
Examples:
Reading in a text corpus of Tweets on a particular topic to perform sentiment analysis.
Reading in a corpus of song lyrics to perform analyses about vocabulary, rhythm, and more.
Reading in tabular data about Economics to correlate
Economic Connectedness
withSocial Mobility
.
So what is a file?#
A file is a set of bytes used to store some kind of data.
The format of this data depends on what you’re using it for, but at some level, it is translated into binary bits (1
s and 0
s).
The file format is usually specified in the file extension.
.csv
: comma separated values..txt
: a plain text file..py
: an executable Python file..png
: a portable network graphic file (i.e., an image).
Where are files?#
Files are stored somewhere on your computer (or in a server, etc.), typically in a folder (also called a directory). Thus, each file has its own location
We call this location of a file its path.
File paths can be either absolute or relative.
Absolute file paths#
An absolute file path specifies the location of a file relative to some root directory.
On my computer, the root might be:
/Users/sttrott/...
If a file is called
my_file.txt
, the absolute file path would include every directory leading up to that file, starting from the root.On Mac/Linux, each directory/folder is separated by the the
/
notation.On Windows, they are separated by the
\
notation.
Example: Users/sttrott/CSS/css1/my_file.txt
Relative file paths#
A relative file path specifies the location of a file relative to the current directory (i.e., the one you’re in right now).
For example, say our current directory is
css1
.If a file is called
my_file.txt
, the relative file path would tell the computer how to get tomy_file.txt
fromcss
.On Mac/Linux, each directory/folder is separated by the the
/
notation.On Windows, they are separated by the
\
notation.
Example: css1/my_file.txt
The ..
syntax#
If your target file (e.g., my_file.txt
) is not stored within your current directory, you’ll need to use the ..
syntax.
This tells your computer to “go up a level”.
For example, if we’re currently in css1/lectures/week6
, but we want to get to css1/my_file.txt
, we’ll need to use this notation:
../../my_file.txt
.
Check-in#
Suppose we want to access a file called notes.txt
. This is the absolute path leading to that file:
/Users/sttrott/css/lectures
How would we write the full absolute path, including the file name?
### Your response here
Check-in#
Suppose we want to access a file called notes.txt
. This is the absolute path leading to that file:
/Users/sttrott/css/lectures
However, we’re currently in the labs
directory, which is also in the css
folder.
How would we write the relative path leading from our current directory to lectures/notes.txt
?
### Your response here
Solution#
Suppose we want to access a file called notes.txt
. This is the absolute path leading to that file:
/Users/sttrott/css/lectures
Relative path from css/labs
: ../lectures/notes.txt
File paths: wrap-up#
File paths can be one of the hardest things to get right.
Even as a more experienced programmer, I mess file paths up all the time (including for this class!).
A helpful command is pwd
, which reminds us where we are: i.e., what our current directory is.
pwd
'/Users/seantrott/Dropbox/UCSD/Teaching/CSS/css1/css1_book/lectures'
The how: interacting with files#
Once you’ve located a file, you probably want to either read or write it in some way. Both modes of interacting with a file will require the open
keyword.
In turn, you can open
a file in one of several modes:
w
: writing to that file (i.e., adding text to it).r
: reading that file (i.e., reading what’s already in it).a
: appending to what’s already in the file.
Let’s take these step by step.
Writing a file#
The syntax to open
a file in the writing mode is as follows:
open("filename.txt", "w")
Often, we’ll use the with
keyword as in the codeblock below, which allows us to open
that filename and assign it immediately to a variable.
Then, we can can call
var_name.write("TEXT TO ADD TO FILE")
The advantage of
with
is that it will automaticallyclose
the file once we’re done with thewith
block.
### Open up a file called `test.txt`
with open("test.txt", "w") as f:
### Write string to file
f.write("This is a file.")
Things to be aware of#
filename.txt
doesn’t have to exist when you open a file for writing. It will be created by calling `open(filename.txt).If
filename.txt
does already exist, then by default you’ll over-write what’s there. If you want to just add to the file, use thea
(append) mode instead.To separate lines in this file, use the
\n
character (newline).
Reading a file#
The syntax to open
a file in the reading mode is as follows:
open("filename.txt", "r")
Once we’ve opened the file, we can read
the contents. The read
function will return the contents as a str
.
### Open up a file called `test.txt`
with open("test.txt", "r") as f:
### Read the contents
contents = f.read()
### print out contents
print(contents)
This is a file.
Check-in#
Use the open
command to create and write a new file called my_first_file.txt
. Once you’ve opened it, write a series of lines to that file:
The first line should read:
My name is {NAME}\n
.The next 5 lines should read:
This is line {i} of the file.\n
, wherei
refers to the specfiic line number.
Hint: Remember to use the newline character to separate each line.
### Your code here
Solution#
with open("my_first_file.txt", "w") as f:
f.write("My name is Sean.\n")
for i in range(0, 4):
f.write("This is line {i} of the file.\n".format(i = i+2))
Check-in#
Now use the open
command to open my_first_file.txt
. Once you’ve opened it, read the contents of that file into a new variable called file_contents
.
### Your code here
Solution#
with open("my_first_file.txt", "r") as f:
file_contents = f.read()
print(file_contents)
My name is Sean.
This is line 2 of the file.
This is line 3 of the file.
This is line 4 of the file.
This is line 5 of the file.
File reading, continued#
Before, we read in the entire file as one big str
. There are several other ways to interact with and read a file, however.
.read(n)
, wheren
refers to the number of characters you want to read..readlines()
, which returns alist
of each line in the file.
.read(n)
#
The read
function can be parameterized by the n
argument, which tells Python how many characters of the file to read.
with open("my_first_file.txt", "r") as f:
n_characters = f.read(10)
print(n_characters)
My name is
with open("my_first_file.txt", "r") as f:
n_characters = f.read(15)
print(n_characters)
My name is Sean
.readlines()
#
The readlines
function returns a list
, where each element in the list corresponds to a line in the file.
Lines are defined as being separated by a
\n
character.
with open("my_first_file.txt", "r") as f:
all_lines = f.readlines()
all_lines
['My name is Sean.\n',
'This is line 2 of the file.\n',
'This is line 3 of the file.\n',
'This is line 4 of the file.\n',
'This is line 5 of the file.\n']
Check-in#
Use the
readlines
function to read in all lines frommy_first_file.txt
.Then, use a
for
loop to iterate through each line.For each line,
replace
the\n
character with an empty character (i.e.,""
).Then,
print
out the line.
### Your code here
Solution#
with open("my_first_file.txt", "r") as f:
all_lines = f.readlines()
for line in all_lines:
l = line.replace("\n", "")
print(l)
My name is Sean.
This is line 2 of the file.
This is line 3 of the file.
This is line 4 of the file.
This is line 5 of the file.
Appending a file#
If you open
a pre-existing file in the w
mode, you can overwrite all of its existing content.
If you wish to simply add to that file, you can instead open it in the a
mode: open("filename.txt", "a")
## Open in append mode
with open("my_first_file.txt", "a") as f:
## Syntax to write is the same.
f.write("This is new text I'm adding.")
## Now let's check if it worked...
with open("my_first_file.txt", "r") as f:
file_contents = f.read()
print(file_contents)
My name is Sean.
This is line 2 of the file.
This is line 3 of the file.
This is line 4 of the file.
This is line 5 of the file.
This is new text I'm adding.
Closing a file#
Technically, it is good practice to always close
a file once you’ve opened it.
If you’re using the
with
keyword, it’ll automaticallyclose
the file once you finish thewith
block.But if you’re not, you can
close
a file usingvar_name.close()
.
Conclusion#
There’s lots more to working with files (including text files), but this sets the foundation. Now you should feel a little more comfortable:
Understanding how to navigate your computer’s directory structure.
E.g., knowing “where” a file is located.
Knowing how to
open
a file in Python.Knowing how to read or write that file.
This will form the basis of working with future file types, such as .csv
(a very common format for representing tabular data).