forked from ninas/umonya_notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
files.html
437 lines (347 loc) · 19.2 KB
/
files.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
<title>Introductory Programming in Python: Files</title>
<link rel='stylesheet' type='text/css' href='style.css' />
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
<script src="animation.js" type="text/javascript">
</script>
</head>
<body onload="animate_loop()">
<div class="page">
<h1>Introductory Programming in Python: Lesson 18<br />
Files for Input and Output</h1>
<div class="centered">
[<a href="random.html">Prev: Random Numbers</a>] [<a href="index.html">Course Outline</a>] [<a href="regexp.html">Next: Regular Expressions</a>]
</div>
<h2>Filesystems, Directories, and Files</h2>
<p>The concept of a file is fairly intuitive, but as with all things
programming, intuition is not enough. Let us briefly explore exactly
what a file is. Microsoft has served, as always, to confuse and
obfuscate the concept of a file. A file is interchangeably called a
document, a file, a song, a movie, a spreadsheet, etc... It is
important to understand that the multiple names used in common parlance
for a file are in fact references to the <em>use</em> to which that
file is put, rather than the idea of what a file actually is. So what
is a file? Simply put, a file is an ordered collection of data
associated with a name and location by a filesystem on a particular
device. The device might be a hard drive, a flash disk, a CD-ROM, or
even your cellphone. The same file may be used for many different
things, e.g. I can read my .mp3 file as text. It makes very little
sense, but it is possible. Alternatively, one could attempt to play the
contents of a large spreadsheet through one's sound card. Again, it
won't make much sense, in fact it'll just sound like a burst of static,
but it is possible.</p>
<div class="centered">
<strong>A file is a name associated with an ordered collection of data on some storage medium</strong>
</div>
<p>Another concept that the GUI's of the 90's onwards have eroded
significantly is the idea of the filesystem. Viewed on a physical
storage device, we encounter a number of problems dealing directly with
files. Firstly, a single file need not be stored in a single contiguous
area; it can be <em>fragmented</em>. Secondly, there's no apparent
structure or order to where files are stored relative to one another.
Your word processing documents might be stored right next to, or even
intermeshed with, files belonging to the operating system, your music
collection, or your applications. Clearly we need a way of imposing a
logical structure onto a collection of files. So we introduce a method
of grouping arbitrary files together, namely the
<strong>directory</strong>, which you may know of as a folder.
Generally, any file can be put into any directory, but cannot exist in
multiple directories at once. Thus we have given files a
<strong>location</strong>. However, directories can also contain other
directories, introducing a hierarchical structure of files within
directories, within other directories, within ... oh hell. Where does
it end? It does end, or rather it <em>starts</em> somewhere!</p>
<div class="centered">
<strong>A directory is an arbitrary unordered grouping of files</strong>
</div>
<p>Every filesystem has a starting point called the
<strong>root</strong>. In MS-DOS, Windows, Symbian and some other
cellphone OS's, each filesystem is assigned it's own root using a
letter from the alphabet, as in <code>C:\</code>. Note the backslash!
In linux, unix and any other POSIX complaint OS, the root is simply
called <code>/</code>, and other filesystems can be placed inside the
root, much like directories can be placed inside one another. So now we
have a way to specify a particular file exactly, even if two files
might have the name name, no two files can have exactly the same
location. The location of a file is always specified from the root down
through each directory to the file, including the name of the file,
eg.</p>
<pre class='listing'>C:\Documents and Settings\James\Desktop\todo.txt
/home/james/Desktop/todo.txt</pre>
<p>Note how in both cases we start with the root, and name each
directory successively, zeroing in on the directory containing the file
of interest. We separate the directory names with <code>\</code> in the
case of windows/MS-DOS, or <code>/</code> in the case of Linux/POSIX.
The explicit sequence from root to name is known as the <strong>full
path</strong> of the file, as we have followed the full path from root
through each directory to the file.</p>
<h2>Relative Paths and the Working Directory</h2>
<p>Of course specifying the full path of every file every time we wish
to use it is inconvenient. Many operating systems thus include the idea
of a <strong>working directory</strong>. Working directories are tied
to login sessions, and are not readily apparent in GUI's. When a user
logs in to a system, their working directory for that login session is
usually set to their home directory. Various OS commands can change or
print out the working directory. <code>cd</code> is used to change the
working directory to another one. Whenever a file or directory is
specified, and the specification is not a full path, the file is
considered relative to the current working directory. For example,
specifying only a name for a file, implies the file we are looking for
is in the working directory. When program runs, it inherits the working
directory from the login session from which it is run. In the example
below, the full path to the working directory is displayed in the
prompt.</p>
<pre class='listing'>/home/james $ ls
todo.txt
/home/james $ cd /home
/home $ ls
james
/home $ cat james/todo.txt
I have nothing to do!
/home $ </pre>
<p>Note how the final command specifies an incomplete path
'james/todo.txt', and not the full path. Because a full path was not
specified, the working directory ('/home' at the time) is prepended to
the name, yielding '/home/james/todo.txt'. Thus we are able to specify
files and directories lower down in the directory hierarchy in a
relative manner.</p>
<p>Of course if a file is in a directory somewhere above or rather
outside the working directory, it would seem we must still use the full
path to specify it, but there are a few shortcuts we can take.</p>
<ul>
<li><code>./</code> indicates the working directory, not much help to us really...</li>
<li><code>../</code> indicates the directory above the directory specified in the path so far.</li>
</ul>
<p>For example, if the working directory were '/home/james/IntroPython'</p>
<ul>
<li><code>index.html</code> would have the full path <code>/home/james/IntroPython/index.html</code></li>
<li><code>./index.html</code> would have the full path <code>/home/james/IntroPython/index.html</code></li>
<li><code>data/testinput.txt</code> would have the full path <code>/home/james/IntroPython/data/testinput.txt</code></li>
<li><code>../todo.txt</code> would have the full path <code>/home/james/todo.txt</code></li>
<li><code>../amusement/phdcomics.tar.gz</code> would have the full path <code>/home/james/amusement/phdcomics.tar.gz</code></li>
<li><code>../../../bin/ls</code> would have the full path <code>/bin/ls</code></li>
<li><code>/home/james/../../bin/ls</code> would have the full path <code>/bin/ls</code></li>
</ul>
<h2>Files for Input and Output</h2>
<p>And all this has been leading up to the idea that programs need not
limit themselves to key strokes and the screen as input and output
methods respectively. Files can be used as both input and output. Files
allow us to conveniently input large quantities of data, and similarly
output large quantities of data for later review. When we want to work
with files, we must clearly be able to specify the file we wish to work
with, using it's location, and name, i.e. a valid full or relative
path.</p>
<h2>Opening Files</h2>
<p>Opening a file in python is simple. We use the <a class="doclink"
href='http://docs.python.org/lib/built-in-funcs.html'>open
function</a>. The open function returns a <a class="doclink"
href='http://docs.python.org/lib/bltin-file-objects.html'>file
object</a>, given a path to a file in the form of string, and a string
specifying whether to open the file for reading or writing.</p>
<p>All files in python are treated as text files, meaning they are
broken up into units called lines. However, there is also a concept
called a <strong>file pointer</strong>. A file pointer is like a cursor
in a word processor, which sits between characters in a file. When we
read from a file, we read from the file pointer onwards to the right.
Similarly all writing done to the file will be from the file pointer
onwards, overwriting if the file already contained data to the right of
the file pointer, otherwise extending the size of the file as
necessary. Both reading and writing a file, reposition the file pointer
at the end of the sequence read or written.</p>
<pre class='listing'>Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) [MSC v.1310 32 bit (Intel)] on win32
[GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("input.txt","r")
>>> f
<open file 'input.txt', mode 'r' at 0xb7dacentered>
>>>
</pre>
<p>Here we see the open function in action. It is most commonly used as
the expression of an assignment statement, but obviously, being a
function, can be used anywhere an expression is valid. The open
function takes two parameters, firstly, the path of the file to open,
and secondly, a string specifying the mode (read, write, or append) in
which to open the file. Valid strings for the mode are</p>
<ul>
<li>"r" opens the file in read only mode. The file must exist prior
to opening. A plus suffix ("r+") means the file is opened for both
reading and writing. In either case the file is opened with the
file pointer at the beginning of the file.</li>
<li>"w" opens the file in write only mode. The file will be
overwritten if it already exists, otherwise it will be created. A
plus suffix ("w+") means the file is opened for both reading and
writing, but the file is still truncated to zero bytes if it exists
already. Obviously this means the file pointer starts at the
beginning of the file. Writing beyond the end of a file simply
enlarges the file to accommodate whatever is written.</li>
<li>"a" opens the file in append mode. The file can only be written
to, and the file pointer starts at the <strong>end</strong> of the
file, meaning all data subsequently written will be added to the
end of the file. If the file doesn't already exist, it will be
created. A plus suffix ("a+") means the file is opened for both
reading and writing, however the file pointer is positioned at the
<strong>beginning</strong> of the file, meaning "a+" and "r+" are
essentially equivalent.</li>
</ul>
<h2>Reading From Files</h2>
<p>Once we have an open file object, we can use its methods to both
read from and write to the file it represents. When a file is opened
for reading, the file pointer is positioned at the beginning of the
file (position 0) which is just before the first character in the file.
File objects provide a variety of methods to read from files...</p>
<ul>
<li><code><file object>.read(<count>)</code> reads
'count' characters from the file object starting at the file
pointer, and returns the read characters as a string. If less than
'count' characters remain to be read, then a string is still
returned, but it will be shorter than count characters. An empty
string is returned if the file pointer is at the end of the
file.</li>
<li><code><file object>.readline()</code> reads from the file
pointer onwards up to and including the first newline ('\n')
character, and returns the read characters as a string. An empty
string is returned if the file pointer is at the end of the
file.</li>
<li><code><file object>.readlines()</code> reads from the
file pointer until the end of file, returning a list of lines, each
containing the trailing newline characters, as strings. An empty
list is returned if the file pointer is already at the end of the
file.</li>
</ul>
<p>Using the following file as an example</p>
<pre class='listing'>This is a simple file
Containing only three lines
of text</pre>
<p>We can demonstrate the use of the various file reading methods.</p>
<pre class='listing'>>>> f.read(3)
'Thi'
>>> f.readline()
's is a simple file\n'
>>> f.readlines()
['Containing only three lines\n', 'of text\n']
>>></pre>
<p>Note how using the simple 'read' method, we get only three
characters (being the count we specified), and the 'readline' method
continues from where 'read' finished. This illustrates the file pointer
in action. The file pointer is position between the 'i' and 's' of
'This' on the first line of the file after the 'read' method is
executed. After the 'readline' it is between the end of line one and
the first character of line two ('C'). Thus, 'readlines' has two
complete lines left to read when it is called.</p>
<p>Often a more useful way to read the lines of a file in sequence, is
the for loop construct over a file. When used in a for loop a file
object acts as a sequence of lines, as in</p>
<pre class='listing'>>>> f = open("input.txt","r")
>>> for line in f:
... print line.rstrip()
...
This is a simple file
Containing only three lines
of text
>>></pre>
<h2>Writing To Files</h2>
<p>Writing files uses methods very similar to those used to read from
files, except that writing is often buffered in memory, meaning the
file on disk is only actually updated when newlines are written, the
buffer is explicitly flushed, or the file object is closed.</p>
<ul>
<li><code><file object>.write(<string>)</code> writes
the contents of 'string' to the file, starting at the file pointer
and overwriting data in the file, or enlarging the file as
necessary. Note that no newline is added, so multiple successive
write calls without any newlines in their respective strings,
produce only one line of text in the file.</li>
<li><code><file object>.writelines(<list>)</code>
writes the elements of the list, which must all be strings, in
order to the file. Newlines are not added, so if the strings have
no newlines, only one line of output will be written.</li>
</ul>
<p>As an example, let's create a new file, and write some text out to
it.</p>
<pre class='listing'>>>> w = open("newfile.txt","w")
>>> w.write("This is the first line of text ")
>>> w.write("This is still on the first line\n")
>>> w.writelines(["The second line\n", "The Salmon Mousse"])
>>> w.close()
>>> </pre>
<p>Looking at 'newfile.txt', we see</p>
<pre class='listing'>This is the first line of text This is still on the first line
The second line
The Salmon Mousse</pre>
<p>We see from the first two calls to 'write' that we have to supply
our own newline characters to force line breaks in the output. And
finally that closing our files is a good idea when they are opened for
writing. Technically, python's garbage collector usually takes care of
this for us, closing file objects before they are collected, but it is
considered good practice to explicitly close files.</p>
<ul>
<li><code><file object>.close()</code> closes the file,
disallowing further read or write operation on the file. Any
pending written data in the file buffer is flushed to disk.
Technically, close is called automatically when a file object
variable is garbage collected, but it is good practice to
explicitly close the files your program opens, as the operating
system has a limit to the number of files that may be open at one
time.</li>
</ul>
<h2>Moving the File Pointer Manually</h2>
<p>Occasionally we may want to move the file pointer manually, for
example when reading a file of a format that allows us to skip or
ignore large sections to get to the part of the file we want. Python
provides two methods of file objects to do this, the first to tell us
where the pointer is currently, and the second to move it. Both treat
the file as a one dimensional stream of characters, much like a string.
The file pointer is an integer index into this 'string' specifying the
first character that would be read or written in the next
operation.</p>
<ul>
<li><code><file object>.tell()</code> returns the index of
the file pointer for the file.</li>
<li>
<code><file object>.seek(<position>[,
<whence>])</code> moves the file pointer to position
relative to a position indicated by whence. Whence can take on
one of three values, defaulting to 0, each of which mean
<ol start="0">
<li>Set position relative to the beginning of the
file.</li>
<li>Set position relative to the current position, i.e.
negative positions will be before the current position, and
positive positions will be after it.</li>
<li>Set position relative to the end of the file, i.e. only
negative positions will give us a position inside the
file.</li>
</ol>
</li>
</ul>
<h2>Truncating a File</h2>
<p>Now that we can move the file pointer manually, we can do some funky
things. We could for example seek to somewhere in the middle of a file,
and overwrite the data there, without affecting the rest of the file.
We might even want to remove information from a file, where we seek to
the start of the information, and overwrite it with... well, spaces. Oh
dear, we have a problem here! We can't actually <em>remove</em>
information from a file, or can we? The truth is that we can only
shorten the length of file, more formally known as truncating the file.
What this means is that to delete information contained in the file
starting at some position (x) up to some position (y), we must read
everything from y to the end of the file, and then seek back to x, and
write what we read. Finally we truncate the file.</p>
<ul>
<li><code><file object>.truncate([size])</code> truncates the
file to be a maximum of size bytes in length. If size is not given,
the file is truncated at the current file pointer position.</li>
</ul>
<div class="centered">
[<a href="random.html">Prev: Random Numbers</a>] [<a href="index.html">Course Outline</a>] [<a href="regexp.html">Next: Regular Expressions</a>]
</div>
</div>
<div class="pagefooter">
Copyright © James Dominy 2007-2008; Released under the <a href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation License</a><br />
<a href="intropython.tar.gz">Download the tarball</a>
</div>
</body>
</html>