making latex fully indexed etc.
This initial commit just has print and copy tools
and tests to verify correct operation of those tools.
--- /dev/null
+# Makefile which tests the python internals for sanity.
+
+clean:
+ -rm -f *.pyc
+ -rm -f *.tmp
+ -rm -f *.out
+
+test:
+ python printtokens.py test1.in >test1.tmp
+ diff test1.tmp test1.base
+ python copyfile.py test1.in
+ diff test1.in test1.in.out
--- /dev/null
+These tools are intended to help updating the latex
+source of a DWARF document to get its references
+complete and correct.
+
+Our fundamental approach is to tokenize the input line-by-line
+and then use trivial pattern matching to determine what
+tokens need updating on what lines. Always trying to ensure
+that unless we intend to change a line that it is emitted
+byte-for-byte unchanged. We change lines (in most cases)
+by simply inserting new tokens on the line (or possibly
+inserting characters into a token).
+
+Because latex names are non-traditional (compared to
+other languages) we adopt an inefficient but
+simple scanning and lexing approach.
+
+Every latex source file is read completely into an dwfile object
+which contains a
+List of lines
+ each line composed of a list of tokens
+ each token described below.
+
+If we are writing out updated latex source we want all the
+unchanged text output to match the input. No spacing changes
+and no changes except what the task at hand is to do.
+So the tokens serve that task.
+
+For example, one task at hand might be to find every DW_*
+reference and ensure it is either a link target (livetarg) or a
+link (livelink), and rewrite any that are neither as a link.
+(see our latex commands livelink and livetarg in latex source).
+
+
+Another task at hand might be to take every DW_ and rewrite
+things like DW\_AT\_foo as DW\-\_AT\-\_foo so that latex
+can hyphenate.
+
+Sometimes we'll want to read a single latex file, sometimes several of them.
+
+If we're reading all the files at once (for some reason) we construct
+an overall
+ List of LFile (LFile mentioned above).
+
+
+TOKENS:
+This part is idiosyncratic to reflect our other goals.
+
+INDIVIDUAL TOKEN characters:
+{
+}
+[
+]
+space-character
+tab-character
+are individual tokens.
+All 4 forms (tex, underbar, std, label) are identical.
+
+The space or tab character is an individual token.
+All 4 forms (tex, underbar, std, label) are identical.
+
+We swallow the input linefeed (or CR-LF) on input,
+it does not appear in the tokens.
+
+IDENTIFIER:
+The letters in \-_A-Za-z0-9 allowed in an identifier.
+An identifier begins with one of _ \ letter and has
+at least one letter in it.
+Identifers are held in multiple strings in a token
+ tex: (meaning with \_ and possibly \-)
+ underbar: (meaning with \_, no \-)
+ std: (meaning with _ no \, as in the DWARF std)
+ label: (meaning with no _ or \ the form used as part of labels )
+
+OTHER:
+All other characters are considered letters which are
+to be reproduced on output. A glob of such are simply considered
+a non-identifier single token.
+All 4 forms (tex, underbar, std, label) are identical.
+
+
+Performance:
+We simply don't care about performance as long as a task takes
+less than a few minutes. There are only about 70,000 words in a complete
+document so we ignore efficiency issues.
--- /dev/null
+# Copyright 2012 DWARF Debugging Information Format Committee
+# This reads the input tokens and then
+# writes them out in new files.
+# Used to verify that the output matches the input byte-for-byte
+
+import sys
+import fileio
+
+def read_args():
+ cur = 1
+ filelist = []
+ while len(sys.argv) > cur:
+ print "argv[",cur,"] = ", sys.argv[cur]
+ v = sys.argv[cur]
+ filelist += [v]
+ cur = int(cur) + 1
+
+ dwf = fileio.readFilelist(filelist)
+
+ dwf.dwwrite()
+
+if __name__ == '__main__':
+ read_args()
+
+
+
--- /dev/null
+
+# All the little classes used in storing latex source data.
+# Copyright 2012 DWARF Debugging Information Format Committee
+
+import sys
+
+def isIdStart(c):
+ if isIndivid(c) == "y":
+ return "n"
+ if ord(c) >= ord('a') and ord(c) <= ord('z'):
+ return "y"
+ if ord(c) >= ord('A') and ord(c) <= ord('Z'):
+ return "y"
+ # It is tex/latex, so backslash starts a word.
+ if c == "\\":
+ return "y"
+ if c == "_":
+ return "y"
+ return "n"
+
+def isIdNext(c):
+ if isIndivid(c) == "y":
+ return "n"
+ if ord(c) >= ord('a') and ord(c) <= ord('z'):
+ return "y"
+ if ord(c) >= ord('A') and ord(c) <= ord('Z'):
+ return "y"
+ if ord(c) >= ord('0') and ord(c) <= ord('9'):
+ return "y"
+ # This is so we allow the colon in our tags
+ if c == ":":
+ return "y"
+ if c == "\\":
+ return "y"
+ if c == "-":
+ return "y"
+ if c == "_":
+ return "y"
+ return "n"
+def isIndivid(c):
+ if c == "[":
+ return "y"
+ if c == "]":
+ return "y"
+ if c == "{":
+ return "y"
+ if c == "}":
+ return "y"
+ if c == " ":
+ return "y"
+ if c == "\t":
+ return "y"
+ return "n"
+
+class dwtoken:
+ def __init__(self):
+ self._tex = []
+ self._underbar = []
+ self._std = []
+ self._label = []
+ # Class is "id", "ind","other","none"
+ self._class = "none"
+ def setIndivid(self,c):
+ self._tex = [c]
+ self._underbar = [c]
+ self._std = [c]
+ self._label = [c]
+ self._class = "ind"
+ def setInitialIdChar(self,c):
+ self._tex = [c]
+ self._class = "id"
+ def setNextIdChar(self,c):
+ self._tex += [c]
+
+ def setInitialOther(self,c):
+ self._tex = [c]
+ self._underbar = [c]
+ self._std = [c]
+ self._label = [c]
+ self._class = "other"
+ def setNextOther(self,c):
+ self._tex += [c]
+ self._underbar += [c]
+ self._std += [c]
+ self._label += [c]
+ self._class = "other"
+ def finishUpId(self):
+ """ This transforms the strings from the input form into
+ the internal forms we want.
+ """
+ self._underbar = []
+ self._std = []
+ self._label = []
+ n = 0
+ # Drop \-
+ while int(n) < len(self._tex):
+ c = self._tex[n]
+ if n < (len (self._tex) - 1) and c == "\\" and self._tex[n+1] == "-":
+ n = n +2
+ continue
+ self._underbar += [c]
+ n = n +1
+ # Drop \ from \_
+ n = 0
+ while int(n) < len(self._underbar):
+ c = self._underbar[n]
+ if n < (len (self._underbar) - 1) and c == "\\" and self._underbar[n+1] == "_":
+ n = n +1
+ continue
+ self._std += [c]
+ n = n +1
+ # Drop underbar
+ n = 0
+ while int(n) < len(self._std):
+ c = self._std[n]
+ if c == "_":
+ n = n +1
+ continue
+ self._label += [c]
+ n = n +1
+
+ def dwprintquotedshortform(self,d):
+ print "'",self.shortform(d),"'",
+ def shortform(self,d):
+ return ''.join(d)
+ def dwprint(self):
+ if self._class == "ind":
+ print self._class,
+ self.dwprintquotedshortform(self._tex)
+ print ""
+ else:
+ # This prints the token with end-line oddly.
+ print self._class,
+ self.dwprintquotedshortform(self._tex)
+ self.dwprintquotedshortform(self._underbar)
+ self.dwprintquotedshortform(self._std)
+ self.dwprintquotedshortform(self._label)
+ print ""
+ def dwwrite(self,outfile):
+ for x in self._tex:
+ outfile.write(x)
+
+class dwline:
+ """using an input line, create a list of tokens for the line.
+ Legal class transitions in tokenize() are:
+ none->other
+ none->id
+ none->ind
+ other->ind
+ other->id
+ id->ind
+ id->other
+ """
+ def __init__(self):
+ # list of dwtoken.
+ self._toks = []
+
+
+ def tokenize(self,rec):
+ """using an input line, create a list of tokens for the line.
+ Legal class transitions in tokenize() are:
+ none->other
+ none->id
+ none->ind
+ other->ind
+ other->id
+ id->ind
+ id->other
+ """
+ dwclass = "none"
+ combotok = dwtoken()
+ for c in rec:
+ if c == "\n" or c == "\r":
+ # Just drop these for now. Allowing them
+ # would not be harmful.
+ continue
+ elif dwclass == "none" or dwclass == "ind":
+ if isIndivid(c) == "y":
+ a = dwtoken()
+ a.setIndivid(c);
+ self._toks += [a]
+ continue
+ if isIdStart(c) == "y":
+ combotok.setInitialIdChar(c)
+ dwclass = "id"
+ continue
+ # is "other"
+ combotok.setInitialOther(c)
+ dwclass = "other"
+ continue
+ elif dwclass == "id":
+ if isIdNext(c) == "y":
+ combotok.setNextIdChar(c)
+ continue
+ if isIndivid(c) == "y":
+ combotok.finishUpId()
+ self._toks += [combotok]
+ combotok = dwtoken()
+ a = dwtoken()
+ a.setIndivid(c);
+ dwclass = "ind"
+ self._toks += [a]
+ continue
+ # Other class input, other starts here.
+ combotok.finishUpId()
+ self._toks += [combotok]
+ combotok = dwtoken()
+ combotok.setInitialOther(c);
+ dwclass = "other"
+ continue
+ elif dwclass == "other":
+ if isIndivid(c) == "y":
+ self._toks += [combotok]
+ combotok = dwtoken()
+ a = dwtoken()
+ a.setIndivid(c);
+ dwclass = "ind"
+ self._toks += [a]
+ continue
+ if isIdStart(c) == "y":
+ self._toks += [combotok]
+ combotok = dwtoken()
+ combotok.setInitialIdChar(c);
+ dwclass = "id"
+ continue
+ combotok.setNextOther(c);
+ continue
+ # Else case impossible.
+
+ #Finish up final non-empty other or id token
+ if dwclass == "id":
+ combotok.finishUpId()
+ self._toks += [combotok]
+ dwclass = "none"
+ if dwclass == "other":
+ self._toks += [combotok]
+ dwclass = "none"
+ def dwprint(self,linenum):
+ print "Number of tokens in line ",linenum," : ",len(self._toks)
+ if len(self._toks) == 0:
+ #Just print an empty line.
+ print ""
+ else:
+ for t in self._toks:
+ t.dwprint()
+ def dwwrite(self, outfile, linenum):
+ for t in self._toks:
+ t.dwwrite(outfile)
+ outfile.write("\n")
+
+
+class dwfile:
+ def __init__(self,name):
+ # list of dwline.
+ self._name = name
+ # Name of the file.
+ self._lines = []
+ try:
+ file = open(name,"r");
+ except IOError, message:
+ print >> sys.stderr , "File could not be opened: ", name
+ sys.exit(1)
+ while 1:
+ try:
+ rec = file.readline()
+ except EOFError:
+ break
+ if len(rec) < 1:
+ # eof
+ break
+
+ aline = dwline()
+ aline.tokenize(rec)
+ self._lines += [aline]
+
+ def dwprint(self):
+ print "Number of lines in ", self._name, ": ",len(self._lines)
+ lnum = 1
+ for l in self._lines:
+ l.dwprint(lnum)
+ lnum = lnum + 1
+ def dwwrite(self):
+ # The lnum is just for debugging messages.
+
+ outname = self._name + ".out"
+ print outname
+ try:
+ outfile = open(outname,"w");
+ except IOError, message:
+ print >> sys.stderr , "Output File could not be opened: ", name
+ sys.exit(1)
+ lnum = 1
+ for l in self._lines:
+ l.dwwrite(outfile,lnum)
+ lnum = lnum + 1
+
+
+
+class dwfiles:
+ def __init__(self):
+ # list of dwfile.
+ self._files = []
+
+ def addFile(self,name):
+ f = dwfile(name)
+ self._files += [f]
+
+ def dwprint(self):
+ print "Number of files: ",len(self._files);
+ for f in self._files:
+ f.dwprint()
+ def dwwrite(self):
+ for f in self._files:
+ f.dwwrite()
+
+
+
+def readFilelist(filelist):
+ dwf = dwfiles()
+ for f in filelist:
+ dwf.addFile(f)
+ return dwf
+
--- /dev/null
+# Copyright 2012 DWARF Debugging Information Format Committee
+# This simply prints the input tokens.
+# Used to verify basic sanity of the reading code.
+
+import sys
+import fileio
+
+def print_args():
+ cur = 1
+ filelist = []
+ while len(sys.argv) > cur:
+ print "argv[",cur,"] = ", sys.argv[cur]
+ v = sys.argv[cur]
+ filelist += [v]
+ cur = int(cur) + 1
+
+ dwf = fileio.readFilelist(filelist)
+ dwf.dwprint()
+
+if __name__ == '__main__':
+ print_args()
+
+
+
--- /dev/null
+argv[ 1 ] = test1.in
+Number of files: 1
+Number of lines in test1.in : 16
+Number of tokens in line 1 : 0
+
+Number of tokens in line 2 : 14
+id ' appropriate ' ' appropriate ' ' appropriate ' ' appropriate '
+ind ' '
+ind ' '
+ind ' '
+ind ' '
+id ' prefix ' ' prefix ' ' prefix ' ' prefix '
+ind ' '
+other ' ( ' ' ( ' ' ( ' ' ( '
+id ' DW\_TAG_foo ' ' DW\_TAG_foo ' ' DW_TAG_foo ' ' DWTAGfoo '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+id ' DW\_AT_foo ' ' DW\_AT_foo ' ' DW_AT_foo ' ' DWATfoo '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+Number of tokens in line 3 : 8
+id ' DW\_END_line ' ' DW\_END_line ' ' DW_END_line ' ' DWENDline '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+id ' DW\_ATE_x ' ' DW\_ATE_x ' ' DW_ATE_x ' ' DWATEx '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+id ' DW\_OP_foo ' ' DW\_OP_foo ' ' DW_OP_foo ' ' DWOPfoo '
+other ' , ' ' , ' ' , ' ' , '
+Number of tokens in line 4 : 18
+id ' DW\_LANG ' ' DW\_LANG ' ' DW_LANG ' ' DWLANG '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+id ' DW\_LNE ' ' DW\_LNE ' ' DW_LNE ' ' DWLNE '
+other ' , ' ' , ' ' , ' ' , '
+ind ' '
+id ' DW\_CC ' ' DW\_CC ' ' DW_CC ' ' DWCC '
+ind ' '
+id ' or ' ' or ' ' or ' ' or '
+ind ' '
+id ' DW\_CFA ' ' DW\_CFA ' ' DW_CFA ' ' DWCFA '
+ind ' '
+id ' respectively ' ' respectively ' ' respectively ' ' respectively '
+other ' ) ' ' ) ' ' ) ' ' ) '
+ind ' '
+id ' followed ' ' followed ' ' followed ' ' followed '
+ind ' '
+id ' by ' ' by ' ' by ' ' by '
+Number of tokens in line 5 : 6
+id ' \_lo\_user ' ' \_lo\_user ' ' _lo_user ' ' louser '
+ind ' '
+id ' or ' ' or ' ' or ' ' or '
+ind ' '
+id ' \_hi\_user ' ' \_hi\_user ' ' _hi_user ' ' hiuser '
+other ' . ' ' . ' ' . ' ' . '
+Number of tokens in line 6 : 0
+
+Number of tokens in line 7 : 1
+id ' \endlastfoot ' ' \endlastfoot ' ' \endlastfoot ' ' \endlastfoot '
+Number of tokens in line 8 : 7
+id ' \livelink ' ' \livelink ' ' \livelink ' ' \livelink '
+ind ' { '
+id ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration '
+ind ' } '
+ind ' { '
+id ' DW\-\_TAG\-\_access\-\_declaration ' ' DW\_TAG\_access\_declaration ' ' DW_TAG_access_declaration ' ' DWTAGaccessdeclaration '
+ind ' } '
+Number of tokens in line 9 : 5
+other ' & ' ' & ' ' & ' ' & '
+ind ' '
+id ' DECL ' ' DECL ' ' DECL ' ' DECL '
+ind ' '
+id ' \\ ' ' \\ ' ' \\ ' ' \\ '
+Number of tokens in line 10 : 0
+
+Number of tokens in line 11 : 0
+
+Number of tokens in line 12 : 9
+id ' information ' ' information ' ' information ' ' information '
+ind ' '
+id ' entry ' ' entry ' ' entry ' ' entry '
+ind ' '
+id ' with ' ' with ' ' with ' ' with '
+ind ' '
+id ' the ' ' the ' ' the ' ' the '
+ind ' '
+id ' tag ' ' tag ' ' tag ' ' tag '
+Number of tokens in line 13 : 8
+id ' \livetarg ' ' \livetarg ' ' \livetarg ' ' \livetarg '
+ind ' { '
+id ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration ' ' chap:DWTAGaccessdeclaration '
+ind ' } '
+ind ' { '
+id ' DW\_TAG\_access\_declaration ' ' DW\_TAG\_access\_declaration ' ' DW_TAG_access_declaration ' ' DWTAGaccessdeclaration '
+ind ' } '
+other ' . ' ' . ' ' . ' ' . '
+Number of tokens in line 14 : 1
+id ' Each ' ' Each ' ' Each ' ' Each '
+Number of tokens in line 15 : 24
+id ' such ' ' such ' ' such ' ' such '
+ind ' '
+id ' entry ' ' entry ' ' entry ' ' entry '
+ind ' '
+id ' is ' ' is ' ' is ' ' is '
+ind ' '
+id ' a ' ' a ' ' a ' ' a '
+ind ' '
+id ' child ' ' child ' ' child ' ' child '
+ind ' '
+id ' of ' ' of ' ' of ' ' of '
+ind ' '
+id ' the ' ' the ' ' the ' ' the '
+ind ' '
+id ' class ' ' class ' ' class ' ' class '
+ind ' '
+id ' or ' ' or ' ' or ' ' or '
+ind ' '
+id ' structure ' ' structure ' ' structure ' ' structure '
+ind ' '
+id ' type ' ' type ' ' type ' ' type '
+ind ' '
+id ' entry ' ' entry ' ' entry ' ' entry '
+other ' . ' ' . ' ' . ' ' . '
+Number of tokens in line 16 : 0
+
--- /dev/null
+
+appropriate prefix (DW\_TAG_foo, DW\_AT_foo,
+DW\_END_line, DW\_ATE_x, DW\_OP_foo,
+DW\_LANG, DW\_LNE, DW\_CC or DW\_CFA respectively) followed by
+\_lo\_user or \_hi\_user.
+
+\endlastfoot
+\livelink{chap:DWTAGaccessdeclaration}{DW\-\_TAG\-\_access\-\_declaration}
+& DECL \\
+
+
+information entry with the tag
+\livetarg{chap:DWTAGaccessdeclaration}{DW\_TAG\_access\_declaration}.
+Each
+such entry is a child of the class or structure type entry.
+