Lecture 5 - Input/output

Lecture 5 - Input/output#

Reading numeric data

All reading/writing is done with text objects, i.e. reading a ‘7’ will be type str with a single character rather than a numeric value.
To get numbers from file intended for computation we have to do a type conversion (typically int or float)

The following syntax:

with open(filename, mode) as file:
    # do stuff

file is a File Object that can be iterated with each each element being a line in the file.
The with statement simplifies the management of files because it defines the scope of file and closes the object automatically when leaving the with block.

In other words, the with syntax replaces the following:

file = open(filename, mode)
# do stuff
file.close()

Some possible values for mode:

'r' - open for reading
'w' - open for writing - this mode implies that a completely new file is written
'a' - open for writing in append mode - writes to the end of the file (i.e., if the file already exists, it will append to it, otherwise it will create a new file)

For more access modes, see: https://docs.python.org/3/library/functions.html#open

It is also possible to read binary file format but we will not use it in this course. But as always - check it out on your own.

Reading data from text file

file.readline() – reads a line from the opened file into a string
file.readlines() – reads all lines from the opened file into a list of strings
file.read() - reads the entire opened file into a string
for line in file: - using the TextIOWrapper as iterable

Writing data to a text file

file.write() - writes its input to the opened file without adding any extra characters, such as line-break characters (\n)
file.writelines() - reads an iterable of strings (typically a list of strings) and writes each element to its own line in the opened file. No line break characters (\n) are added

# First, let us check the type of file in the two different open syntaxes
# As an example, we will use the 'numbers_one_to_ten.txt' file which on subsequent lines contains the numbers from 1 to 10
# Run the code below and check in the output of help() that file is an iterator ()

# with statement
with open('numbers_one_to_ten.txt', 'r') as file:
    help(file)
    
# open/close statement
file = open('numbers_one_to_ten.txt', 'r')
help(file)
file.close()

Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)
 |  
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are returned to the caller untranslated. If it has any of
 |    the other legal values, input lines are only terminated by the given
 |    string, and the line ending is returned to the caller untranslated.
 |  
 |  * On output, if newline is None, any '\n' characters written are
 |    translated to the system default line separator, os.linesep. If
 |    newline is '' or '\n', no translation takes place. If newline is any
 |    of the other legal values, any '\n' characters written are translated
 |    to the given string.
 |  
 |  If line_buffering is True, a call to flush is implied when a call to
 |  write contains a newline character.
 |  
 |  Method resolution order:
 |      TextIOWrapper
 |      _TextIOBase
 |      _IOBase
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  close(self, /)
 |      Flush and close the IO object.
 |      
 |      This method has no effect if the file is already closed.
 |  
 |  detach(self, /)
 |      Separate the underlying buffer from the TextIOBase and return it.
 |      
 |      After the underlying buffer has been detached, the TextIO is in an
 |      unusable state.
 |  
 |  fileno(self, /)
 |      Returns underlying file descriptor if one exists.
 |      
 |      OSError is raised if the IO object does not use a file descriptor.
 |  
 |  flush(self, /)
 |      Flush write buffers, if applicable.
 |      
 |      This is not implemented for read-only and non-blocking streams.
 |  
 |  isatty(self, /)
 |      Return whether this is an 'interactive' stream.
 |      
 |      Return False if it can't be determined.
 |  
 |  read(self, size=-1, /)
 |      Read at most n characters from stream.
 |      
 |      Read from underlying buffer until we have n characters or we hit EOF.
 |      If n is negative or omitted, read until EOF.
 |  
 |  readable(self, /)
 |      Return whether object was opened for reading.
 |      
 |      If False, read() will raise OSError.
 |  
 |  readline(self, size=-1, /)
 |      Read until newline or EOF.
 |      
 |      Returns an empty string if EOF is hit immediately.
 |  
 |  reconfigure(self, /, *, encoding=None, errors=None, newline=None, line_buffering=None, write_through=None)
 |      Reconfigure the text stream with new parameters.
 |      
 |      This also does an implicit stream flush.
 |  
 |  seek(self, cookie, whence=0, /)
 |      Change stream position.
 |      
 |      Change the stream position to the given byte offset. The offset is
 |      interpreted relative to the position indicated by whence.  Values
 |      for whence are:
 |      
 |      * 0 -- start of stream (the default); offset should be zero or positive
 |      * 1 -- current stream position; offset may be negative
 |      * 2 -- end of stream; offset is usually negative
 |      
 |      Return the new absolute position.
 |  
 |  seekable(self, /)
 |      Return whether object supports random access.
 |      
 |      If False, seek(), tell() and truncate() will raise OSError.
 |      This method may need to do a test seek().
 |  
 |  tell(self, /)
 |      Return current stream position.
 |  
 |  truncate(self, pos=None, /)
 |      Truncate file to size bytes.
 |      
 |      File pointer is left unchanged.  Size defaults to the current IO
 |      position as reported by tell().  Returns the new size.
 |  
 |  writable(self, /)
 |      Return whether object was opened for writing.
 |      
 |      If False, write() will raise OSError.
 |  
 |  write(self, text, /)
 |      Write string to stream.
 |      Returns the number of characters written (which is always equal to
 |      the length of the string).
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  buffer
 |  
 |  closed
 |  
 |  encoding
 |      Encoding of the text stream.
 |      
 |      Subclasses should override.
 |  
 |  errors
 |      The error setting of the decoder or encoder.
 |      
 |      Subclasses should override.
 |  
 |  line_buffering
 |  
 |  name
 |  
 |  newlines
 |      Line endings translated so far.
 |      
 |      Only line endings translated during reading are considered.
 |      
 |      Subclasses should override.
 |  
 |  write_through
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from _IOBase:
 |  
 |  __del__(...)
 |  
 |  __enter__(...)
 |  
 |  __exit__(...)
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  readlines(self, hint=-1, /)
 |      Return a list of lines from the stream.
 |      
 |      hint can be specified to control the number of lines read: no more
 |      lines will be read if the total size (in bytes/characters) of all
 |      lines so far exceeds hint.
 |  
 |  writelines(self, lines, /)
 |      Write a list of lines to stream.
 |      
 |      Line separators are not added, so it is usual for each of the
 |      lines provided to have a line separator at the end.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from _IOBase:
 |  
 |  __dict__

Help on TextIOWrapper object:

class TextIOWrapper(_TextIOBase)
 |  TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)
 |  
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |  
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getpreferredencoding(False).
 |  
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |  
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |  
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are returned to the caller untranslated. If it has any of
 |    the other legal values, input lines are only terminated by the given
 |    string, and the line ending is returned to the caller untranslated.
 |  
 |  * On output, if newline is None, any '\n' characters written are
 |    translated to the system default line separator, os.linesep. If
 |    newline is '' or '\n', no translation takes place. If newline is any
 |    of the other legal values, any '\n' characters written are translated
 |    to the given string.
 |  
 |  If line_buffering is True, a call to flush is implied when a call to
 |  write contains a newline character.
 |  
 |  Method resolution order:
 |      TextIOWrapper
 |      _TextIOBase
 |      _IOBase
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  close(self, /)
 |      Flush and close the IO object.
 |      
 |      This method has no effect if the file is already closed.
 |  
 |  detach(self, /)
 |      Separate the underlying buffer from the TextIOBase and return it.
 |      
 |      After the underlying buffer has been detached, the TextIO is in an
 |      unusable state.
 |  
 |  fileno(self, /)
 |      Returns underlying file descriptor if one exists.
 |      
 |      OSError is raised if the IO object does not use a file descriptor.
 |  
 |  flush(self, /)
 |      Flush write buffers, if applicable.
 |      
 |      This is not implemented for read-only and non-blocking streams.
 |  
 |  isatty(self, /)
 |      Return whether this is an 'interactive' stream.
 |      
 |      Return False if it can't be determined.
 |  
 |  read(self, size=-1, /)
 |      Read at most n characters from stream.
 |      
 |      Read from underlying buffer until we have n characters or we hit EOF.
 |      If n is negative or omitted, read until EOF.
 |  
 |  readable(self, /)
 |      Return whether object was opened for reading.
 |      
 |      If False, read() will raise OSError.
 |  
 |  readline(self, size=-1, /)
 |      Read until newline or EOF.
 |      
 |      Returns an empty string if EOF is hit immediately.
 |  
 |  reconfigure(self, /, *, encoding=None, errors=None, newline=None, line_buffering=None, write_through=None)
 |      Reconfigure the text stream with new parameters.
 |      
 |      This also does an implicit stream flush.
 |  
 |  seek(self, cookie, whence=0, /)
 |      Change stream position.
 |      
 |      Change the stream position to the given byte offset. The offset is
 |      interpreted relative to the position indicated by whence.  Values
 |      for whence are:
 |      
 |      * 0 -- start of stream (the default); offset should be zero or positive
 |      * 1 -- current stream position; offset may be negative
 |      * 2 -- end of stream; offset is usually negative
 |      
 |      Return the new absolute position.
 |  
 |  seekable(self, /)
 |      Return whether object supports random access.
 |      
 |      If False, seek(), tell() and truncate() will raise OSError.
 |      This method may need to do a test seek().
 |  
 |  tell(self, /)
 |      Return current stream position.
 |  
 |  truncate(self, pos=None, /)
 |      Truncate file to size bytes.
 |      
 |      File pointer is left unchanged.  Size defaults to the current IO
 |      position as reported by tell().  Returns the new size.
 |  
 |  writable(self, /)
 |      Return whether object was opened for writing.
 |      
 |      If False, write() will raise OSError.
 |  
 |  write(self, text, /)
 |      Write string to stream.
 |      Returns the number of characters written (which is always equal to
 |      the length of the string).
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  buffer
 |  
 |  closed
 |  
 |  encoding
 |      Encoding of the text stream.
 |      
 |      Subclasses should override.
 |  
 |  errors
 |      The error setting of the decoder or encoder.
 |      
 |      Subclasses should override.
 |  
 |  line_buffering
 |  
 |  name
 |  
 |  newlines
 |      Line endings translated so far.
 |      
 |      Only line endings translated during reading are considered.
 |      
 |      Subclasses should override.
 |  
 |  write_through
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from _IOBase:
 |  
 |  __del__(...)
 |  
 |  __enter__(...)
 |  
 |  __exit__(...)
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  readlines(self, hint=-1, /)
 |      Return a list of lines from the stream.
 |      
 |      hint can be specified to control the number of lines read: no more
 |      lines will be read if the total size (in bytes/characters) of all
 |      lines so far exceeds hint.
 |  
 |  writelines(self, lines, /)
 |      Write a list of lines to stream.
 |      
 |      Line separators are not added, so it is usual for each of the
 |      lines provided to have a line separator at the end.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from _IOBase:
 |  
 |  __dict__

Reading text files#

# Write code that can read in the file 'numbers_one_to_ten.txt'
# Then compute and print the sum of the numbers in the variable sum

sum = 0.0
with open('numbers_one_to_ten.txt', 'r') as file:
    
    for line in file:
        sum += float(line)

print(sum) 

55.0

assert sum == 55, error_message(sum, 55)

# Let us check what happens if the file you want to read does not exist
# It should throw a FileNotFoundError

open("file_that_does_not_exist.txt", 'r')

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[7], line 4
      1 # Let us check what happens if the file you want to read does not exist
      2 # It should throw a FileNotFoundError
----> 4 open("file_that_does_not_exist.txt", 'r')

File ~/opt/miniconda3/envs/bb1000_ht23/lib/python3.9/site-packages/IPython/core/interactiveshell.py:284, in _modified_open(file, *args, **kwargs)
    277 if file in {0, 1, 2}:
    278     raise ValueError(
    279         f"IPython won't let you open fd={file} by default "
    280         "as it is likely to crash IPython. If you know what you are doing, "
    281         "you can use builtins' open."
    282     )
--> 284 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'file_that_does_not_exist.txt'

We will now move to a slightly more complex file format.

Specifically, we will consider xyz file format, which is a format used to specify the geometry of molecules.

The format of this file type is as follows:

line 1: Number of atoms
line 2: Title line
line 3: Atom1 coord_x coord_y coord_z
line 4: Atom2 coord_x coord_y coord_z
... continue with atom entries for all atoms in the molecule

They have the following data types

line 1: int
line 2: str
line 3: str float float float
line 4: str float float float
... continue with atom entries for all atoms in the molecule

Your task Read in and store the coordinates of each atom in a list and create an atom_list list and an atom_coords list of lists where the coordinates of each atom is a list in the list

Remember to convert the coordinate values to float type which is their appropriate type
In this case, you cannot just map the line to a float because line now contains multiple values
One way to separate the values is by using the split() function on the line (that is, a string)

# First, let us try it out with an example line from the file
line = "H1       1.2194     -0.1652      2.1600"

print(line.split())
# So the line.split() returns a list of strings generated by splitting line at spaces (implied by the empty argument list to split).
# We could define an arbitrary separator: 

# such as "-" 
print(line.split("-"))

# Or at "."
print(line.split("."))

# Or even "0.1652"
print(line.split("0.1652"))

# Reading the entire file into memory: 
# Read the file "benzene.xyz" using readlines() and fill in atom_list and atom_coords

atom_list = []
atom_coords = []

with open("benzene.xyz", 'r') as file:
    
    lines = file.readlines()
    
for idx, line in enumerate(lines):
    if idx < 2:
        continue
    else:
        string = line.split()
        atom_list.append(string[0])
        atom_coords.append([float(string[1]), float(string[2]), float(string[3])])

print(atom_list)
print(atom_coords)
### END_SOLUTION

assert atom_list[0] == "H1", error_message(atom_list[0], "H1")
assert atom_coords[0] == [1.2194, -0.1652, 2.16], error_message(atom_coords[0], [1.2194, -0.1652, 2.16])
assert atom_list[5] == "H6", error_message(atom_list[5], "H6")
assert atom_coords[5] == [-2.4836, 0.1021, -0.0204] , error_message(atom_coords[5], [-2.4836, 0.1021, -0.0204])

# Reading the file line-by-line:
# Read in the same file but now line-by-line and fill in atom_list and atom_coords

atom_list = []
atom_coords = []

with open("benzene.xyz", "r") as file:
    
    # Read the two first lines without doing anything
    file.readline()
    file.readline()
    
    for line in file:
        string = line.split()
        atom_list.append(string[0])
        atom_coords.append([float(string[1]), float(string[2]), float(string[3])])

print(atom_list, atom_coords)

assert atom_list[0] == "H1", error_message(atom_list[0], "H1")
assert atom_coords[0] == [1.2194, -0.1652, 2.16], error_message(atom_coords[0], [1.2194, -0.1652, 2.16])
assert atom_list[5] == "H6", error_message(atom_list[5], "H6")
assert atom_coords[5] == [-2.4836, 0.1021, -0.0204] , error_message(atom_coords[5], [-2.4836, 0.1021, -0.0204])

# Now convert your read xyz code block to a read_xyz function that takes a filename as input and returns the atom_list and atom_coords lists
# Check that it can properly read in the benzene.xyz file

def read_xyz(filename: str)->tuple:
    """Reads an xyz file and returns the list of atoms
    and the coordinates as list of lists

    Args:
        filename (str): name of the xyz file

    Returns:
        tuple: atom list, coordinates
    """
    atom_list = []
    atom_coords = []
    
    with open(filename, "r") as file:
    
        # Read the two first lines without doing anything
        file.readline()
        file.readline()
        
        for line in file:
            string = line.split()
            atom_list.append(string[0])
            atom_coords.append([float(string[1]), float(string[2]), float(string[3])])

    return atom_list, atom_coords

atom_list, atom_coords = read_xyz('benzene.xyz')
print(atom_list)
print(atom_coords)

Next, we will consider something more complicated.

We have a file that contains a lot of data, and we are only interested in a very specific part of it

The file we will be dealing with is named molcas.out, which is an output from OpenMolcas (a quantum chemistry program).

In other words, it is an output containing quantum-mechanical information about a molecule.

We are specifically interested in knowing the electronic energies of the molecule - the number printed in the following line starting with:

"::    CASPT2 Root  1     Total energy"

But there can be more than one of these lines, and we are interested in all of them.

In this particular case, there are nine in total. The relevant output part looks like this but is “hidden” inside a very long file

       Total XMS-CASPT2 energies:
::    XMS-CASPT2 Root  1     Total energy:  -7664.02011873
::    XMS-CASPT2 Root  2     Total energy:  -7663.86538263
::    XMS-CASPT2 Root  3     Total energy:  -7663.83034100
::    XMS-CASPT2 Root  4     Total energy:  -7663.82724336
::    XMS-CASPT2 Root  5     Total energy:  -7663.82401120
::    XMS-CASPT2 Root  6     Total energy:  -7663.82386506
::    XMS-CASPT2 Root  7     Total energy:  -7663.79797706
::    XMS-CASPT2 Root  8     Total energy:  -7663.78384205
::    XMS-CASPT2 Root  9     Total energy:  -7663.76782714

Your task

Write a function get_openmolcas_energies that takes the filename of a molcas file and reads the relevant energies
The function should return these energies as a dictionary named energy_dict with key: value pairs as root_number (int 1-9) : energy (float)

def get_openmolcas_energies(filename: str)->dict[str:float]:
    """Reads OpenMolcas file and returns a 
    dictionary with key: value as root: XMS-CASPT2 energy

    Args:
        filename (str): OpenMolcas file to read

    Returns:
        dict[str:float]: key=root, value=XMS-CASPT2 energy
    """    
    energy_dict = {}
    
    with open(filename, "r") as file:
        
        for line in file:
            if "XMS-CASPT2 Root" in line:
                line_split = line.split()
                root_number = int(line_split[3])
                energy = float(line_split[6])
                energy_dict[root_number] = energy
        
    return energy_dict

energy_dict = get_openmolcas_energies("molcas.out")
print(energy_dict)       

assert type(energy_dict) == dict, error_message(type(energy_dict), dict)
assert energy_dict[1] == -7664.02011873, error_message(energy_dict[1], -7664.02011873)
assert energy_dict[5] == -7663.8240112, error_message(energy_dict[5], -7663.8240112)

Writing files#

# Write a file analogous to numbers_one_to_ten.txt
# Use the name numbers_one_to_ten_NEW.txt
# Remember to change to write mode by using 'w'
# Also remember to insert new line character "\n" 

with open("numbers_one_to_ten_NEW.txt", 'w') as file:
    
    for i in range(1,11):
        file.write(str(i)+'\n')

# You already heard about the xyz file format. Let us now try to write a file of xyz format

# Write a function write_xyz that:
# - takes a filename as input a title line, a list of atoms (atom_list) and a list of lists of coordinates (atom_coords) 
# - and writes the coordinates to file in xyz format.
#
# As example, use the atom_list and atom_coords for caffeine (the compound giving the awakening effect of coffee) given below. Let the file be named "caffeine.xyz"
atom_list = ["N1", "C2", "N3", "C4", "C5", "C6", "N7", "C8", "N9", "C10", "O11", "O12", "C13", "C14", "H15", "H16", "H17", "H18", "H19", "H20", "H21", "H22", "H23", "H24"]
atom_coords = [[ 1.5808, 0.7027,-0.2279], [ 1.7062,-0.7374,-0.2126], [ 0.5340,-1.5671,-0.3503], [ 0.3231, 1.3600, 0.0274], [-0.8123, 0.4553, 0.0817], [-0.6967,-0.9322,-0.0662], [-2.1886, 0.6990, 0.2783], [-2.8512,-0.5205, 0.2532], [-1.9537,-1.5188, 0.0426], [ 0.6568,-3.0274,-0.1675], [ 2.8136,-1.2558,-0.1693], [ 0.2849, 2.5744, 0.1591], [-2.8096, 2.0031, 0.5032], [ 2.8301, 1.5004,-0.1968], [-3.9271,-0.6787, 0.3762], [ 1.4823,-3.4046,-0.7865], [-0.2708,-3.5204,-0.4868], [ 0.8567,-3.2990, 0.8788], [-2.4123, 2.7478,-0.2017], [-2.6042, 2.3621, 1.5221], [-3.8973, 1.9344, 0.3695], [ 3.5959, 1.0333,-0.8314], [ 3.2249, 1.5791, 0.8255], [ 2.6431, 2.5130,-0.5793]] 

def write_xyz(filename: str, title: str, atom_list: list[str], atom_coords: list[float]):
    """Writes an xyz file name filename for the molecule defined
    by the atom_list and atom_coords    

    Args:
        filename (str): name of the xyz file to be written
        title (str): title line to be printed in the header
        atom_list (list[str]): list of atoms
        atom_coords (list): list of lists of atom coordinates
    """
    with open(filename, 'w') as file:
        
        number_of_atoms = len(atom_list)
        file.write(str(number_of_atoms)+'\n')
        file.write(title + '\n')
        
        string = ''
        for atom, coords in zip(atom_list, atom_coords):
            
            string += atom + " " + str(coords[0]) + " " + str(coords[1]) + " " + str(coords[2]) + "\n"
    
        file.write(string)

# You should now be able to run and get the caffeine.xyz file written 
write_xyz("caffeine.xyz", "caffeine", atom_list, atom_coords)

Remember to always check your output (here the file). One way to do that is to read it back in and print it.#

Formatted strings#

With the simplest implementation, the “caffeine.xyz” file will look very unordered.
While this may not be a problem for the read function (sometimes it is – it depends on the format assumptions made in the read function), it is very unpleasing to look at.
So, let us now touch a bit on how to format a string in a nicer way.

NOTE: The formatting style you will learn below assumes Python 3

An f in front of a string means that the string is fancier formatted. See more here
This means that you can use variable names directly in the string expression by encapsulating them in {}

Take the following formatted string:

variable1 = "Hello"
variable2 = 4

string = f"I am variable1 {variable1} and I am variable2 {variable2}"
print(string)

which gives

I am variable1 Hello and I am variable2 4

We can format even more inside the {}. For example, if I have a variable name of type float and want to print it with 12 digits and 6 decimal places, I can do the following

PI = 3.14159265359

string = f"PI is approx. {PI:12.6f}"

print(string)

which gives

PI is approx.     3.141593

Here the : inside {} indicates that you will format the variable output.

Floats: :<number of digits>.<number of decimal places>f
Integers: :<number of digits>d
Strings: :<number of characters>s

There are many, many more options. You need to study these yourself. Start by reading section “Format Specification Mini-Language”

https://docs.python.org/3/library/string.html#format-string-syntax

# Time for you to try it out

# Example data
fish_string = "fish"
number_int = 4
atom_str = "H1"
number_float1 = 4.65432853
number_float2 = 1.54639524
number_float3 = 0.53856743

# Define a string (named string) containing the fish_string twice each taking a 10 character space
string = f"{fish_string:10s}{fish_string:10s}"
print(string) 

assert string == "fish      fish      ", error_message(string, "fish      fish      ")

# Define a string named string containing three times number_int taking up 4 digit space
string = f"{number_int:4d}{number_int:4d}{number_int:4d}"
print(string)

assert string == "   4   4   4", error_message(string, "   4   4   4")

# Define a string named string that contains
# - the atom_str with 4 character spaces 
# - and then the three number_float variables with 10 digits and 5 decimal places

# This will be all you need to get a nice output for your xyz writer

string = f"{atom_str:4s}{number_float1:10.5f}{number_float2:10.5f}{number_float3:10.5f}"
print(string)

assert string == "H1     4.65433   1.54640   0.53857", error_message(string, "H1     4.65433   1.54640   0.53857")

Your task:#

Modify your write_xyz function from above to write in a nicely formatted output
The output xyz file should be named caffeine_NICE.xyz

You can for instance use {:4s}{:20.10f}{:20.10f}{:20.10f} which should give you the following:

24
caffeine
N1          1.5808000000        0.7027000000       -0.2279000000
C2          1.7062000000       -0.7374000000       -0.2126000000
N3          0.5340000000       -1.5671000000       -0.3503000000
C4          0.3231000000        1.3600000000        0.0274000000
C5         -0.8123000000        0.4553000000        0.0817000000
C6         -0.6967000000       -0.9322000000       -0.0662000000
N7         -2.1886000000        0.6990000000        0.2783000000
C8         -2.8512000000       -0.5205000000        0.2532000000
N9         -1.9537000000       -1.5188000000        0.0426000000
C10         0.6568000000       -3.0274000000       -0.1675000000
O11         2.8136000000       -1.2558000000       -0.1693000000
O12         0.2849000000        2.5744000000        0.1591000000
C13        -2.8096000000        2.0031000000        0.5032000000
C14         2.8301000000        1.5004000000       -0.1968000000
H15        -3.9271000000       -0.6787000000        0.3762000000
H16         1.4823000000       -3.4046000000       -0.7865000000
H17        -0.2708000000       -3.5204000000       -0.4868000000
H18         0.8567000000       -3.2990000000        0.8788000000
H19        -2.4123000000        2.7478000000       -0.2017000000
H20        -2.6042000000        2.3621000000        1.5221000000
H21        -3.8973000000        1.9344000000        0.3695000000
H22         3.5959000000        1.0333000000       -0.8314000000
H23         3.2249000000        1.5791000000        0.8255000000
H24         2.6431000000        2.5130000000       -0.5793000000

atom_list = ["N1", "C2", "N3", "C4", "C5", "C6", "N7", "C8", "N9", "C10", "O11", "O12", "C13", "C14", "H15", "H16", "H17", "H18", "H19", "H20", "H21", "H22", "H23", "H24"]
atom_coords = [[ 1.5808, 0.7027,-0.2279], [ 1.7062,-0.7374,-0.2126], [ 0.5340,-1.5671,-0.3503], [ 0.3231, 1.3600, 0.0274], [-0.8123, 0.4553, 0.0817], [-0.6967,-0.9322,-0.0662], [-2.1886, 0.6990, 0.2783], [-2.8512,-0.5205, 0.2532], [-1.9537,-1.5188, 0.0426], [ 0.6568,-3.0274,-0.1675], [ 2.8136,-1.2558,-0.1693], [ 0.2849, 2.5744, 0.1591], [-2.8096, 2.0031, 0.5032], [ 2.8301, 1.5004,-0.1968], [-3.9271,-0.6787, 0.3762], [ 1.4823,-3.4046,-0.7865], [-0.2708,-3.5204,-0.4868], [ 0.8567,-3.2990, 0.8788], [-2.4123, 2.7478,-0.2017], [-2.6042, 2.3621, 1.5221], [-3.8973, 1.9344, 0.3695], [ 3.5959, 1.0333,-0.8314], [ 3.2249, 1.5791, 0.8255], [ 2.6431, 2.5130,-0.5793]] 

def write_xyz(filename: str, title: str, atom_list: list[str], atom_coords: list[float]):
    """Writes an xyz file name filename for the molecule defined
    by the atom_list and atom_coords    

    Args:
        filename (str): name of the xyz file to be written
        title (str): title line to be printed in the header
        atom_list (list[str]): list of atoms
        atom_coords (list): list of lists of atom coordinates
    """
    with open(filename, 'w') as file:
        
        number_of_atoms = len(atom_list)
        file.write(str(number_of_atoms)+'\n')
        file.write(title + '\n')
        
        string = ''
        for atom, coords in zip(atom_list, atom_coords):
            
            string += f"{atom:4s}{coords[0]:20.10f}{coords[1]:20.10f}{coords[2]:20.10f}\n"
    
        file.write(string)

write_xyz("caffeine_NICE.xyz", "caffeine", atom_list, atom_coords)

# Again you should always check the output

Reading and writing standard file formats#

As with many other things, there are a wealth of Python modules that can deal with read/write of standard file formats.

For example:

CSV files (.csv) – import csv – More about csv
JSON files (.json) - import json- More about json
YAML files (.yaml) - import yaml - More about yaml

We will not go through these but I recommend you to have a look: using standard file format is preferable when possible.

Command-line parsing with argparse#

The argparse module is intended to write user-friendly command-line interfaces to Python programs. Check out this link.

It may become handy in one of the project assignments.

The module is build around an instance of the class argparse.ArgumentParser, which is nothing but a container for input argument specifications.

First, we need to talk a bit about arguments.

An argument is a value that is accepted by a Python function or the parser (in this case).
This value becomes its own Python value when used in the function.

Two different types of arguments

Positional – required arguments that come in positional order (AND ARE ALWAYS LISTED BEFORE OPTIONAL ARGUMENTS!).
Optional - not required because the parameters underlying this argument has a default value (If not explicitly set in argparse using the default keyword, it will be set to None). If no argument is given, the program will use the default

# We start by importing the module
import argparse

# Then we initialize an instance of the argparse.ArgumentParser class

parser = argparse.ArgumentParser(prog='<name of your program>',
                    description='What the program does',
                    epilog='Text at the bottom of help')

# Step 1 is to define the input arguments

# First, we put any positional arguments
parser.add_argument("filename", help="The input file", type = str)

# Then any optional arguments (-X single-character argument, --xxxxxxx multicharacter argument)
parser.add_argument("-n", "--nstudents", help="The number of students", type = int)

# We then need to run the parse_args() method of the object to get the parser object to correctly interpret the extracted data
# args = parser.parse_args() <------------- THIS IS HOW TO RUN IT IN A PYTHON SCRIPT FILE

args = parser.parse_args(["test.csv"]) #<------------To make this work in a Jupyter Notebook (where we cannot provide arguments as usual), we provide a list of input strings instead

# The parser object stores the inputs in the filename and nstudents data attributes
print(args.filename, args.nstudents)

args = parser.parse_args(["test.csv", "-n 5"])

# The parser object stores the inputs in the filename and nstudents data attributes. This time nstudents is not None
print(args.filename, args.nstudents)