Noweb literate programming file for Star grammar and parser specification. We are using Amit Patel's excellent context-sensitive Yapps2 parser. ' This was chosen because it enables us to process long semicolon delimited strings without running into Python recursion limits. In the original kjParsing implementation, it was impossible to get the lexer to return a single line of text within the semicolon-delimited string as that re would have matched a single line of text anywhere in the file. The resulting very long match expression only worked for text strings less than about 9000 characters in length. For further information about Yapps2, see http://theory.stanford.edu/ amitp/Yapps/

Several standards are available, of which four are implemented: 1.0, 1.1, CIF2 and STAR2. CIF2 differs from STAR2 in that lists have comma separators and no nested save frames are allowed. Note that 1.0,1.1 and CIF2/STAR2 differ in their treatment of unquoted data values beginning with brackets.

<1.0_syntax>=
<Python2-3 compatibility>
<Helper functions>
%%
parser StarParser:
    <Regular expressions 1.0>
    <Grammar specification 1.1>
%%

<1.1_syntax>=
<Python2-3 compatibility>
<Helper functions>
%%
parser StarParser:
    <Regular expressions 1.1>
    <Grammar specification 1.1>
%%

The following two recipes produce CIF2 and STAR2 syntax.

<CIF2_syntax>=
<Python2-3 compatibility>
<Helper functions>
%%
parser StarParser:
    <Regular expressions CIF2>
    <Grammar specification CIF2>
%%

The STAR2 syntax

<STAR2_syntax>=
<Python2-3 compatibility>
<Helper functions>
%%
parser StarParser:
    <Regular expressions STAR2>
    <Grammar specification STAR2>
%%

Helper functions.

We have a monitor function which we can call to save the last parsed value (and print, if we are debugging). We also have functions for stripping off delimiters from strings. Finally, we match up our loops after reading them in. Note that we have function stripextras, which is only for semicolon strings, and stripstring, which is for getting rid of the inverted commas.

<Helper functions>= (<-U <-U <-U <-U)
# An alternative specification for the Cif Parser, based on Yapps2
# by Amit Patel (http://theory.stanford.edu/~amitp/Yapps)
#
# helper code: we define our match tokens
lastval = ''
def monitor(location,value):
    global lastval
    #print 'At %s: %s' % (location,repr(value))
    lastval = repr(value)
    return value

# Strip extras gets rid of leading and trailing whitespace, and
# semicolons.
def stripextras(value):
     from .StarFile import remove_line_folding, remove_line_prefix
     # we get rid of semicolons and leading/trailing terminators etc.
     import re
     jj = re.compile("[\n\r\f \t\v]*")
     semis = re.compile("[\n\r\f \t\v]*[\n\r\f]\n*;")
     cut = semis.match(value)
     if cut:        #we have a semicolon-delimited string
          nv = value[cut.end():len(value)-2]
          try:
             if nv[-1]=='\r': nv = nv[:-1]
          except IndexError:    #empty data value
             pass
          # apply protocols
          nv = remove_line_prefix(nv)
          nv = remove_line_folding(nv)
          return nv
     else:
          cut = jj.match(value)
          if cut:
               return stripstring(value[cut.end():])
          return value

# helper function to get rid of inverted commas etc.

def stripstring(value):
     if value:
         if value[0]== '\'' and value[-1]=='\'':
           return value[1:-1]
         if value[0]=='"' and value[-1]=='"':
           return value[1:-1]
     return value

# helper function to get rid of triple quotes
def striptriple(value):
    if value:
        if value[:3] == '"""' and value[-3:] == '"""':
            return value[3:-3]
        if value[:3] == "'''" and value[-3:] == "'''":
            return value[3:-3]
    return value

# helper function to populate a StarBlock given a list of names
# and values .
#
# Note that there may be an empty list at the very end of our itemlists,
# so we remove that if necessary.
#

def makeloop(target_block,loopdata):
    loop_seq,itemlists = loopdata
    if itemlists[-1] == []: itemlists.pop(-1)
    # print('Making loop with %s' % repr(itemlists))
    step_size = len(loop_seq)
    for col_no in range(step_size):
       target_block.AddItem(loop_seq[col_no], itemlists[col_no::step_size],precheck=True)
    # now construct the loop
    try:
        target_block.CreateLoop(loop_seq)  #will raise ValueError on problem
    except ValueError:
        error_string =  'Incorrect number of loop values for loop containing %s' % repr(loop_seq)
        print(error_string, file=sys.stderr)
        raise ValueError(error_string)

# return an object with the appropriate amount of nesting
def make_empty(nestlevel):
    gd = []
    for i in range(1,nestlevel):
        gd = [gd]
    return gd

# this function updates a dictionary first checking for name collisions,
# which imply that the CIF is invalid.  We need case insensitivity for
# names.

# Unfortunately we cannot check loop item contents against non-loop contents
# in a non-messy way during parsing, as we may not have easy access to previous
# key value pairs in the context of our call (unlike our built-in access to all
# previous loops).
# For this reason, we don't waste time checking looped items against non-looped
# names during parsing of a data block.  This would only match a subset of the
# final items.   We do check against ordinary items, however.
#
# Note the following situations:
# (1) new_dict is empty -> we have just added a loop; do no checking
# (2) new_dict is not empty -> we have some new key-value pairs
#
def cif_update(old_dict,new_dict,loops):
    old_keys = map(lambda a:a.lower(),old_dict.keys())
    if new_dict != {}:    # otherwise we have a new loop
        #print 'Comparing %s to %s' % (repr(old_keys),repr(new_dict.keys()))
        for new_key in new_dict.keys():
            if new_key.lower() in old_keys:
                raise CifError("Duplicate dataname or blockname %s in input file" % new_key)
            old_dict[new_key] = new_dict[new_key]
#
# this takes two lines, so we couldn't fit it into a one line execution statement...
def order_update(order_array,new_name):
    order_array.append(new_name)
    return new_name

# and finally...turn a sequence into a python dict (thanks to Stackoverflow)
def pairwise(iterable):
    try:
        it = iter(iterable)
        while 1:
            yield next(it), next(it)
    except StopIteration:
        return

We can simplify the BNC specification of Nick Spadaccini. First of all, we do not have to have type I and type II strings, which are distinguished by the presence or absence of a line feed directly preceding them and thus by being allowed a semicolon at the front or not. We take care of this by treating as whitespace all terminators except for those with a following semicolon, so that a carriage-return-semicolon sequence matches the start_sc_line uniquely.

We include reserved words and save frames. The other reserved words have no rules defined, so will flag a syntax error. However, as yapps is a context-sensitive parser, it will by default make any word found starting with our reserved words into a data value if it occurs in the expected position, so we explicity exclude stuff starting with our words in the definition of data_value_1.

The syntax rules below correspond to the current STAR2 paper. Commas are not allowed in non-delimited data values so that they can be used to separate list items.

Note that we do not recognise characters outside the Unicode basic multilingual plane in datanames, data headings and save headings. This is due to a limitation of Python 2 unicode strings and will be removed when PyCIFRW is ported to Python 3.

<Regular expressions STAR2>= (<-U)
<STAR2-CIF2 common regular expressions block 1>
<STAR2-CIF2 common regular expressions block 2>
token data_value_1: "((?!(((S|s)(A|a)(V|v)(E|e)_[^\s]*)|((G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_[^\s]*)|((S|s)(T|t)(O|o)(P|p)_[^\s]*)|((D|d)(A|a)(T|t)(A|a)_[^\s]*)))[^\s\"#$',_\{\}\[\]][^\s,\{\}\[\]]*)"

<STAR2-CIF2 common regular expressions block 1>= (<-U U->)
# first handle whitespace and comments, keeping whitespace
# before a semicolon
ignore: "([ \t\n\r](?!;))|[ \t]"
ignore: "(#.*[\n\r](?!;))|(#.*)"
# now the tokens
token LBLOCK:  "(L|l)(O|o)(O|o)(P|p)_"        # loop_
token GLOBAL: "(G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_"
token STOP: "(S|s)(T|t)(O|o)(P|p)_"
token save_heading: u"(S|s)(A|a)(V|v)(E|e)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_\u00A0-\uD7FF\uE000-\uFDCF\uFDF0-\uFFFD\U00010000-\U0001FFFD\U00020000-\U0002FFFD\U00030000-\U0003FFFD\U00040000-\U0004FFFD\U00050000-\U0005FFFD\U00060000-\U0006FFFD\U00070000-\U0007FFFD\U00080000-\U0008FFFD\U00090000-\U0009FFFD\U000A0000-\U000AFFFD\U000B0000-\U000BFFFD\U000C0000-\U000CFFFD\U000D0000-\U000DFFFD\U000E0000-\U000EFFFD\U000F0000-\U000FFFFD\U00100000-\U0010FFFD-]+"
token save_end: "(S|s)(A|a)(V|v)(E|e)_"
token data_name: u"_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_\u00A0-\uD7FF\uE000-\uFDCF\uFDF0-\uFFFD\U00010000-\U0001FFFD\U00020000-\U0002FFFD\U00030000-\U0003FFFD\U00040000-\U0004FFFD\U00050000-\U0005FFFD\U00060000-\U0006FFFD\U00070000-\U0007FFFD\U00080000-\U0008FFFD\U00090000-\U0009FFFD\U000A0000-\U000AFFFD\U000B0000-\U000BFFFD\U000C0000-\U000CFFFD\U000D0000-\U000DFFFD\U000E0000-\U000EFFFD\U000F0000-\U000FFFFD\U00100000-\U0010FFFD-]+" #_followed by stuff
token data_heading: u"(D|d)(A|a)(T|t)(A|a)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_\u00A0-\uD7FF\uE000-\uFDCF\uFDF0-\uFFFD\U00010000-\U0001FFFD\U00020000-\U0002FFFD\U00030000-\U0003FFFD\U00040000-\U0004FFFD\U00050000-\U0005FFFD\U00060000-\U0006FFFD\U00070000-\U0007FFFD\U00080000-\U0008FFFD\U00090000-\U0009FFFD\U000A0000-\U000AFFFD\U000B0000-\U000BFFFD\U000C0000-\U000CFFFD\U000D0000-\U000DFFFD\U000E0000-\U000EFFFD\U000F0000-\U000FFFFD\U00100000-\U0010FFFD-]+"
token start_sc_line: "(\n|\r\n);([^\n\r])*(\r\n|\r|\n)+"
token sc_line_of_text: "[^;\r\n]([^\r\n])*(\r\n|\r|\n)+"
token end_sc_line: ";"
token c_c_b: "\}"
token o_c_b: "\{"
token c_s_b: "\]"
token o_s_b: "\["
#token dat_val_nocomma_nosq: "([^\s\"#$,'_\(\{\[\]][^\s,\[\]]*)|'(('(?![\s,]))|([^\n\r\f']))*'+|\"((\"(?![\s,]))|([^\n\r\"]))*\"+"
token dat_val_internal_sq: "\[([^\s\[\]]*)\]"
# token dat_val_nocomma_nocurl: "([^\s\"#$,'_\(\{\[\]][^\s,}]*)|'(('(?![\s,]))|([^\n\r\f']))*'+|\"([^\n\r\"])*\"+"
# For tests of new DDLm syntax - no quotes or apostrophes in strings, no commas, braces or square brackets in undelimited data values
# This token for triple-quote delimited strings must come before single-quote delimited strings to avoid the opening quotes being
# interpreted as a single-quote delimited string
token triple_quote_data_value: "(?s)'''.*?'''|\"\"\".*?\"\"\""
token single_quote_data_value: "'([^\n\r\f'])*'+|\"([^\n\r\"])*\"+"

Currently just a single line but we allow a whole block just in case.

<STAR2-CIF2 common regular expressions block 2>= (<-U U->)
token END: '$'

CIF 2.0 uses spaces instead of commas to separate list values so commas are allowed in data values

<Regular expressions CIF2>= (<-U)
<STAR2-CIF2 common regular expressions block 1>
token data_value_1: "((?!(((S|s)(A|a)(V|v)(E|e)_[^\s]*)|((G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_[^\s]*)|((S|s)(T|t)(O|o)(P|p)_[^\s]*)|((D|d)(A|a)(T|t)(A|a)_[^\s]*)))[^\s\"#$'_\{\}\[\]][^\s\{\}\[\]]*)"
<STAR2-CIF2 common regular expressions block 2>

CIF 1.1 does not allow unquoted data values to begin with a bracket character, but does not have bracket expressions as such.

<Regular expressions 1.1>= (<-U)
# first handle whitespace and comments, keeping whitespace
# before a semicolon
ignore: "([ \t\n\r](?!;))|[ \t]"
ignore: "(#.*[\n\r](?!;))|(#.*)"
# now the tokens
token LBLOCK:  "(L|l)(O|o)(O|o)(P|p)_"        # loop_
token GLOBAL: "(G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_"
token STOP: "(S|s)(T|t)(O|o)(P|p)_"
token save_heading: "(S|s)(A|a)(V|v)(E|e)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+"
token save_end: "(S|s)(A|a)(V|v)(E|e)_"
token data_name: "_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+" #_followed by stuff
token data_heading: "(D|d)(A|a)(T|t)(A|a)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+"
token start_sc_line: "(\n|\r\n);([^\n\r])*(\r\n|\r|\n)+"
token sc_line_of_text: "[^;\r\n]([^\r\n])*(\r\n|\r|\n)+"
token end_sc_line: ";"
token data_value_1: "((?!(((S|s)(A|a)(V|v)(E|e)_[^\s]*)|((G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_[^\s]*)|((S|s)(T|t)(O|o)(P|p)_[^\s]*)|((D|d)(A|a)(T|t)(A|a)_[^\s]*)))[^\s\"#$'_\{\[\]][^\s]*)|'(('(?=\S))|([^\n\r\f']))*'+|\"((\"(?=\S))|([^\n\r\"]))*\"+"
token END: '$'

The original CIF specification allowed brackets to begin data values, even if not quoted. That is the only difference.

<Regular expressions 1.0>= (<-U)
# first handle whitespace and comments, keeping whitespace
# before a semicolon
ignore: "([ \t\n\r](?!;))|[ \t]"
ignore: "(#.*[\n\r](?!;))|(#.*)"
# now the tokens
token LBLOCK:  "(L|l)(O|o)(O|o)(P|p)_"        # loop_
token GLOBAL: "(G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_"
token STOP: "(S|s)(T|t)(O|o)(P|p)_"
token save_heading: "(S|s)(A|a)(V|v)(E|e)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+"
token save_end: "(S|s)(A|a)(V|v)(E|e)_"
token data_name: "_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+" #_followed by stuff
token data_heading: "(D|d)(A|a)(T|t)(A|a)_[][!%&\(\)*+,./:<=>?@0-9A-Za-z\\\\^`{}\|~\"#$';_-]+"
token start_sc_line: "(\n|\r\n);([^\n\r])*(\r\n|\r|\n)+"
token sc_line_of_text: "[^;\r\n]([^\r\n])*(\r\n|\r|\n)+"
token end_sc_line: ";"
token data_value_1: "((?!(((S|s)(A|a)(V|v)(E|e)_[^\s]*)|((G|g)(L|l)(O|o)(B|b)(A|a)(L|l)_[^\s]*)|((S|s)(T|t)(O|o)(P|p)_[^\s]*)|((D|d)(A|a)(T|t)(A|a)_[^\s]*)))[^\s\"#$'_][^\s]*)|'(('(?=\S))|([^\n\r\f']))*'+|\"((\"(?=\S))|([^\n\r\"]))*\"+"
token END: '$'

The final returned value is a StarFile, with each key a datablock name. The value attached to each key is an entire dictionary for that block. We bypass the standard __setitem__ methods to gain precision in checking for duplicate blocknames and skipping name checks.

Note in the following grammar that we have adjusted for some yapps idiosyncracies: in particular, a nested bracket expression needs to be distinguished from the top-level nested bracket expression otherwise the context-sensitive parser will search for all those items which could follow the top level bracket expression in nested expressions. The current version produces slightly incorrect error messages in that any type of close bracket is supposedly OK, even though only a particular type will be accepted.

We also have to deal with Dimension-type lists, where there may be square brackets as part of the value (e.g. [4[5]]). This requires catching internal square brackets as well. The current grammar specification catches only this case i.e. the first element of the array can be of the form xxx[yyy]. No other elements can have this form, and there can be no trailing characters. This form can be allowed for other elements by trivial expansion of the current description but, until further notice, I do not think it is useful to allow square brackets in list values.

<Grammar specification STAR2>= (<-U)
<CIF2-STAR2 common grammar>
<STAR2-specific grammar>

<Grammar specification CIF2>= (<-U)
<CIF2-STAR2 common grammar>
<CIF2-specific grammar>

CIF2 and STAR2 are almost identical in grammar

<CIF2-STAR2 common grammar>= (<-U <-U)
# now the rules

rule input<<prepared>>: ( ((
            dblock<<prepared>>         {{allblocks = prepared; allblocks.merge_fast(dblock)}}
            (
            dblock<<prepared>>         {{allblocks.merge_fast(dblock)}} #
            )*
            END
            )
            |
            (
            END                 {{allblocks = prepared}}
            )))                   {{allblocks.unlock(); return allblocks}}

     rule dblock<<prepared>>: ( data_heading {{heading = data_heading[5:];thisbc=StarFile(characterset='unicode',standard=prepared.standard);act_heading = thisbc.NewBlock(heading,prepared.blocktype(overwrite=False));stored_block = thisbc[act_heading]}}# a data heading
                  (
                   dataseq<<stored_block>>   #because merging may have changed the heading  
                  |
                  save_frame<<prepared>>     {{thisbc.merge_fast(save_frame,parent=stored_block)}}
                  )*
                   )                      {{stored_block.setmaxnamelength(stored_block.maxnamelength);return (monitor('dblock',thisbc))}} # but may be empty

     rule dataseq<<starblock>>:  data<<starblock>>
                       (
                       data<<starblock>>
                       )*

     rule data<<currentblock>>:        top_loop      {{makeloop(currentblock,top_loop)}}
                                        |
                                        datakvpair    {{currentblock.AddItem(datakvpair[0],datakvpair[1],precheck=False)}} #kv pair

     rule datakvpair: data_name data_value {{return [data_name,data_value]}} # name-value

     rule data_value: (data_value_1          {{thisval = data_value_1}}
                      |
                      delimited_data_value  {{thisval = delimited_data_value}}
                      |
                      sc_lines_of_text      {{thisval = stripextras(sc_lines_of_text)}}
                      |
                      bracket_expression    {{thisval = bracket_expression}}
                      )                     {{return monitor('data_value',thisval)}}

     rule delimited_data_value: (triple_quote_data_value      {{thisval = striptriple(triple_quote_data_value)}}
                                |
                                single_quote_data_value       {{thisval = stripstring(single_quote_data_value)}}
                                )                             {{return thisval}}

     rule sc_lines_of_text: start_sc_line   {{lines = StringIO();lines.write(start_sc_line)}}
                            (
                            sc_line_of_text {{lines.write(sc_line_of_text)}}
                            )*
                            end_sc_line     {{lines.write(end_sc_line);return monitor('sc_line_of_text',lines.getvalue())}}

     rule bracket_expression:  square_bracket_expr   {{return square_bracket_expr}}
                            |
                              curly_bracket_expr    {{return curly_bracket_expr}}


# due to the inability of the parser to backtrack, we contruct our loops in helper functions,
# and simply collect data during parsing proper.

     rule top_loop: LBLOCK loopfield loopvalues {{return loopfield,loopvalues}}

# OK: a loopfield is either a sequence of datanames

     rule loopfield: (            {{loop_seq=[] }}
                     (
                                  ( data_name  )  {{loop_seq.append(data_name)}}
                      )*
                      )                        {{return loop_seq}} # sequence of data names


     rule loopvalues: (
                       (data_value   ) {{dataloop=[data_value]}}
                       (
                       (data_value  ) {{dataloop.append(monitor('loopval',data_value))}}
                       )*
                       )              {{return dataloop}}

     rule save_frame<<prepared>>: save_heading   {{savehead = save_heading[5:];savebc = StarFile();newname = savebc.NewBlock(savehead,prepared.blocktype(overwrite=False));stored_block = savebc[newname] }} 
                      (
                      dataseq<<savebc[savehead]>>
                      |
                      save_frame<<prepared>>     {{savebc.merge_fast(save_frame,parent=stored_block)}}
                      )*
                      save_end           {{return monitor('save_frame',savebc)}}

STAR2 specifies nested save frames and comma-separated list and table elements, whereas CIF2 has space-separated elements.

<STAR2-specific grammar>= (<-U)
     rule save_frame<<prepared>>: save_heading   {{savehead = save_heading[5:];savebc = StarFile();newname = savebc.NewBlock(savehead,prepared.blocktype(overwrite=False));stored_block = savebc[newname] }} 
                      (
                      dataseq<<savebc[savehead]>>
                      |
                      save_frame<<prepared>>     {{savebc.merge_fast(save_frame,parent=stored_block)}}
                      )*
                      save_end           {{return monitor('save_frame',savebc)}}

     rule square_bracket_expr: o_s_b            {{this_list = []}}
                            (  data_value       {{this_list.append(data_value)}}
                              ( ","
                                data_value       {{this_list.append(data_value)}}
                              ) *
                            ) *
                               c_s_b                     {{return StarList(this_list)}}

     rule curly_bracket_expr: ( o_c_b                     {{table_as_list = []}}
                             ( delimited_data_value       {{table_as_list = [delimited_data_value]}}
                              ":"
                              data_value                 {{table_as_list.append(data_value)}}
                            ( ","
                              delimited_data_value       {{table_as_list.append(delimited_data_value)}}
                              ":"
                              data_value                 {{table_as_list.append(data_value)}}
                            ) *
                               ) *
                              c_c_b )                    {{return StarDict(pairwise(table_as_list))}}



<CIF2-specific grammar>= (<-U)
     rule save_frame<<prepared>>: save_heading   {{savehead = save_heading[5:];savebc = StarFile();newname = savebc.NewBlock(savehead,prepared.blocktype(overwrite=False));stored_block = savebc[newname] }} 
                      (
                      dataseq<<savebc[savehead]>>
                      )*
                      save_end           {{return monitor('save_frame',savebc)}}


      rule square_bracket_expr: o_s_b            {{this_list = []}}
                            (  data_value       {{this_list.append(data_value)}}
                              (
                                data_value       {{this_list.append(data_value)}}
                              ) *
                            ) *
                               c_s_b                     {{return StarList(this_list)}}

     rule curly_bracket_expr: ( o_c_b                    {{table_as_list = []}}
                            (  delimited_data_value      {{table_as_list = [delimited_data_value]}}
                              ":"
                              data_value                 {{table_as_list.append(data_value)}}
                            (
                              delimited_data_value       {{table_as_list.append(delimited_data_value)}}
                              ":"
                              data_value                 {{table_as_list.append(data_value)}}
                            ) *
                            ) *
                              c_c_b )                    {{return StarDict(pairwise(table_as_list))}}



The CIF 1.1 grammar specification does not include bracket expressions, but does exclude brackets from beginning unquoted data values. We pass through the argument prepared so we can deal with non-standard dictionary files that contain duplicate datablocks.

<Grammar specification 1.1>= (<-U <-U)
# now the rules

rule input<<prepared>>: ( ((
            dblock<<prepared>>         {{allblocks = prepared;allblocks.merge_fast(dblock)}}
            (
            dblock<<prepared>>         {{allblocks.merge_fast(dblock)}} #
            )*
            END
            )
            |
            (
            END                 {{allblocks = prepared}}
            )))                   {{allblocks.unlock();return allblocks}}

    rule dblock<<prepared>>: ( data_heading {{heading = data_heading[5:];thisbc=StarFile(characterset='unicode',standard=prepared.standard);newname = thisbc.NewBlock(heading,prepared.blocktype(overwrite=False));act_block=thisbc[newname]}}# a data heading
                  (
                   dataseq<<thisbc[heading]>>
                  |
                  save_frame<<prepared>>     {{thisbc.merge_fast(save_frame,parent=act_block)}}
                  )*
# A trick to force rechecking of all datanames, which was skipped by the precheck = True option below
                   )                      {{thisbc[heading].setmaxnamelength(thisbc[heading].maxnamelength);return (monitor('dblock',thisbc))}} # but may be empty

     rule dataseq<<starblock>>:  data<<starblock>>
                       (
                       data<<starblock>>
                       )*

     rule data<<currentblock>>:        top_loop      {{makeloop(currentblock,top_loop)}}
                                        |
                                        datakvpair    {{currentblock.AddItem(datakvpair[0],datakvpair[1],precheck=True)}} #kv pair

     rule datakvpair: data_name data_value {{return [data_name,data_value]}} # name-value

     rule data_value: (data_value_1          {{thisval = stripstring(data_value_1)}}
                      |
                      sc_lines_of_text      {{thisval = stripextras(sc_lines_of_text)}}
                      )                     {{return monitor('data_value',thisval)}}

     rule sc_lines_of_text: start_sc_line   {{lines = StringIO();lines.write(start_sc_line)}}
                            (
                            sc_line_of_text {{lines.write(sc_line_of_text)}}
                            )*
                            end_sc_line     {{lines.write(end_sc_line);return monitor('sc_line_of_text',lines.getvalue())}}

# due to the inability of the parser to backtrack, we contruct our loops in helper functions,
# and simply collect data during parsing proper.

     rule top_loop: LBLOCK loopfield loopvalues {{return loopfield,loopvalues}}

# OK: a loopfield is either a sequence of dataname*,loopfield with stop
# or else dataname,loopfield without stop

     rule loopfield: (            {{toploop=[]}}
                     (
                                  ( data_name  )  {{toploop.append(data_name)}}
                      )*
                      )                        {{return toploop}} # sequence of data names


     rule loopvalues: (
                       (data_value   ) {{dataloop=[data_value]}}
                       (
                       (data_value  ) {{dataloop.append(monitor('loopval',data_value))}}
                       )*
                       )              {{return dataloop}}

     rule save_frame<<prepared>>: save_heading   {{savehead = save_heading[5:];savebc = StarFile();newname=savebc.NewBlock(savehead,prepared.blocktype(overwrite=False));act_block=savebc[newname] }} 
                      (
                      dataseq<<savebc[savehead]>>
                      |
                      save_frame<<prepared>>     {{savebc.merge_fast(save_frame,parent=act_block)}}
                      )*
                      save_end           {{return monitor('save_frame',savebc)}}


Python 2/3 compatibility. We try to keep the code as portable across the 2-3 divide as we can.

<Python2-3 compatibility>= (<-U <-U <-U <-U)
# To maximize python3/python2 compatibility
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import division
from __future__ import absolute_import

from .StarFile import StarBlock,StarFile,StarList,StarDict
from io import StringIO