Comma Separated Value Files

CSV files are really nothing more than a text file of lines where each line is a sequence of data items separated by commas.  They are easily processed in Python and in addition work easily with Excel .

For example every line in the file could look like.  Generally no spaces are allowed on either side of the commas, although this is not always the case.  See the above link.

Mustang,Ford,1965,Red

Corvette,Chevy,1980,Blue

Here we have a line indicating indicating 4 attributes of an automobile.  Generally all lines in a CSV file would be similar, like those two above.  It is possible to have files where the lines are different.  In these cases the first attribute would be a type name that specifies the remaining attributes.

First example problem: Suppose we have a file containing a long list of cars of the above format and further suppose that we would like to read the file and process each line, counting the number of cars that are Fords.   This is really easy.  Just open the file and then read in every line and split it on commas.  This creates a list of the attributes. For example, the program

filehd = open(‘data.csv’,'r’)
for line in filehd:
attribs=line.split(‘,’)
print(attribs)

will read in a file called data.csv containing the above two lines of car data and print out the following.   Note that attribs is a list.

['Mustang', 'Ford', '1965', 'Red\n']
['Corvette', 'Chevy', '1980', 'Blue\n']

If you want to count the number of cars that are Fords then we can,

filehd = open('data.csv','r')
ford_ct=0;
for line in filehd:
   attribs=line.split(',')
   if attribs[1]=='Ford':
       ford_ct=ford_ct+1
print('The number of fords in this file is',ford_ct)

Second example problem: Suppose that you would like to change the order of attributes in the file.  We would do this by reading the file , extract the lines and their attributes and then write back to a new file the reordered attributes separated with commas.  For example we could use command to load the attribute list and do something like the following

filein = open('aves.csv','r')
fileout = open('avesout.csv','w')
for line in filein:
    attribs=line.split(',')
    newline=attribs[5]+','+attribs[4]+','+attribs[6]
    print(newline,file=fileout)

Your output could have been done by using just a print, for example

print(attribs[5],attribs[4],attribs[6],sep=’,',file=fileout)

By the way, there is a special module that you can import that supports csv file I/O.  Its main reason for its existence is that it takes care of the majority of the variations of the csv file format that occur on the web.

Comments are closed.