deutsch     english    français     Print

 

9.1 PERSISTENCE, FILES

 

 

INTRODUCTION

 

Computer-stored information, called data, plays a central role in today's high-tech society. Although they are comparable to written text, there are several important differences:

  • Data can only be read, saved, and processed with a computer system.
  • Data are always coded as 0.1 values. They only receive information content and make sense when they are correctly interpreted (decoded).
  • Data possess a certain life span. Temporary data exist as local variables for a short time in a certain program block or as global variables for the entire duration of the program. Persistent data, however, survive the duration of the program and can later be retrieved.
  • Data have a visibility (availability). While certain data, such as personal data, can be read by anyone in a social network, there are also private data or other data that can be found on storage devices that are not generally accessible to the public.
  • Data can be protected. Protection can be achieved by encryption or restrictions on access (access and password protection).
  • Data can easily be transported on digital communication channels.

Persistent data can be written or read in the form of files with computer programs on physical storage devices (common are: Hard Drive (HD), Solid State Disk (SSD), memory card or USB stick).

Files consist of storage areas with a certain structure and a specific file format. Since the transfer of data, even over large distances, has become fast and cheap, files are stored on distant media (clouds) more and more frequently

Files are managed on the computer in a hierarchical directory structure, i.e. a specific directory can hold not only files, but also sub-directories. Files are accessible through their file path (short 'path') which contains the names of the directories and the file. However, the file system is dependent on the operating system, and therefore there are certain differences between Windows, Mac, and Linux systems.

PROGRAMMING CONCEPTS: Encoding, life span, visibility of data, file

 

 

READING AND WRITING TEXT FILES

 

You have already learned how to read text files in the chapter Internet. In text files, characters are stored sequentially, but one can get a line structure similar to that of a piece of paper by inserting end of line symbols. It is exactly here that the operating systems differ: While Mac and Linux use the ASCII characters <line feed> as an end of line (EOL), Windows uses the combination of <carriage return><line feed>. In Python these characters are coded with \r and \n [more... If you read under Windows a text file or a line in a string so the <cr> is omitted.
If you write a string to a file is under Windows <cr> automatically added.
How to reach a high degree of platform independence
].

You will use an English dictionary in your program, which is available as a text file. You can download it (tfwordlist.zip) from here and unzip it in any directory. Copy the file words-1$.txt in the directory where your program is located.

You will now take a look at the interesting question of which words are palindromes. A palindrome is a word that reads the same forward or backward, without considering the case of the letters.

With open() you receive a file object f that provides you with access to the file. Later you can run through all the data with a simple for loop. Thereby you should pay attention to the fact that each line contains an end of line symbol that must first be cut off with a slice operation before you read the word backwards. In addition, you should convert all characters to lowercase using lower().

Reversing a string is somewhat tricky in Python, since the slice operation also allows for negative indices, and in this case the indices counting begins at the end of the string. If you select a step parameter -1, the string is run through backwards.

def isPalindrom(a):
    return a == a[::-1]

f = open("words-1$.txt")

print "Searching for palindroms..." 
for word in f:
    word = word[:-1] # remove trailing \n
    word = word.lower() # make lowercase
    if isPalindrom(word):
        print word
f.close()
print "All done" 
Highlight program code (Ctrl+C copy, Ctrl+V paste)

With the method readline() you can also read line by line. You can imagine a line pointer to be advanced at each call. Once you have made it to the end of the file, the method returns an empty string. Save the result in a file named palindrom.txt. In order to write to the file, you must first  create it with open() passing it the parameter "w" (for write). Then, you can write to it using the method write(). Do not forget to use the method close() at the end, so that all the characters are for sure written to the file and the operating system resources are released again.

def isPalindrom(a):
    return a == a[::-1]

fInp = open("words-1$.txt")
fOut = open("palindrom.txt", "w")

print "Searching for palindroms..." 
while True:
    word = fInp.readline()
    if word == "":
        break
    word = word[:-1] # remove trailing \n
    word = word.lower() # make lowercase
    if isPalindrom(word):
        print word
        fOut.write(word + "\n")
fInp.close()        
fOut.close()
print "All done" 
Highlight program code (Ctrl+C copy, Ctrl+V paste)

 

 

MEMO

 

When you open text files using open(path, mode) the user mode is specified with the parameter mode.

Mode Description Comment
"r"
(read)
Read only File must already exist. Parameter can be omitted
"w"
(write)
Create and write file An existing file is deleted first
"a"
(append)
Attach at the end of the file Create the file if it does not already exist
"r+" Read and attach File must already exist

Once you have read all the lines of a file and want to read it again, you have to either close the file and reopen it or simply call the method seek(0) of the file object. You can also read the entire contents of the text file in a string using

text = f.read()

and then close the file. You can create a list with the line strings (without end of line) using

textList = text.splitlines()

Other important file operations:

import os
os.path.isfile(path)

Returns True, if the file exists

import os
os.remove(path)

Deletes a file

 

 

SAVING AND RETRIEVING OPTIONS OR GAME FILES

 

Files are often used to save information so that it can be retrieved again during the next execution of the program, for example program settings (options) which are made by the user to customize their program. Maybe you would also like to save the current (game) state of a game so that you are able to continue playing in exactly the same situation.

Options and states usually save nicely as key-value pairs, where the key is an identifier for the value. For example, there are certain configuration values (setup parameters) for the TigerJython IDE:
Key Value
"autosave" True
"language" "de"

As you learned in chapter 6.3, you can save such key-value pairs in a Python dictionary that you can very easily save and retrieve as (binary) files with the module pickle. In the following example, you save the current position and direction of the lobsters and also the position of the simulation cycle regulator before closing the game window. The saved values will then be restored at the next start.

Unbenanntes Dokument
Highlight program code (Ctrl+C copy, Ctrl+V paste)

 

 

MEMO

 

A dictionary can be saved in a file with the method pickle.dump(). It will be a binary file that you are not able to edit directly.

 

 

EXERCISES

 

1.


Search for anagrams in the file words-1$.txt (Anagrams are two words with the same letters, but in different order. You can ignore the case of the letters). Write the anagrams that you found to a file anagram.txt.


2.


The text below was encrypted by anagramming, i.e. the original words were substituted by those with permuted letters.
IINFHS SOLOHC AEMK SIWREKOFR

Try to decrypt the text with help from the word lists (words-1$.txt) [more... Anagrams been used by Galileo to conceal scientific to keep secret].

3.


Create your own ciphertext that can be unambiguously decrypted with the word lists.