These tools are intended to help updating the latex source of a DWARF document to get its references complete and correct. A list of the python source files with the purpose of each is near the end of this FILE. CAUTION: The tools don't really do parsing and the lexical processing is barely adequate for the task. They don't understand the % means 'comment', so avoid comments containing latex code or anything like latex. The tools are not necessarily equivalent in what details of input they accept. For example, Some will handle \livelink{chap:DWTAGfoo}{ DW\_TAG\_foo } (notice the spaces around the DW\_TAG\_foo) and some won't. So use \livelink{chap:DWTAGfoo}{DW\_TAG\_foo} instead with no pointless spaces (or newlines). Then you won't be disappointed. The tools assume you use \livelink and other local commands. If you use the native latex equivalent the tools won't understand. The requirement of the \ before _ in latex is annoying. You can avoid work typing that cruft by just typeing DW_TAG_foo and running tohyphen.py on the text, which will turn it into DW\-\_TAG\-\_foo which is what you want. The hyphens result in nicely formatted lines with line breaks in the DW_* strings, at least in some contexts (but not always). Various character (% - and more) have special meaning to latex. So just typing them and expecting them to appear in the generated document is going to result in disappointment. See the use of \dash in the document. Our fundamental approach is to tokenize the input line-by-line and then use trivial pattern matching to determine what tokens need updating on what lines. Always trying to ensure that unless we intend to change a line that it is emitted byte-for-byte unchanged. We change lines (in most cases) by simply inserting new tokens on the line (or possibly inserting characters into a token). Because latex names are non-traditional (compared to other languages) we adopt an inefficient but simple scanning and lexing approach. BORING DETAILS (you can ignore what follows): Every latex source file is read completely into an dwfile object which contains a List of lines each line composed of a list of tokens each token described below. If we are writing out updated latex source we want all the unchanged text output to match the input. No spacing changes and no changes except what the task at hand is to do. So the tokens serve that task. For example, one task at hand might be to find every DW_* reference and ensure it is either a link target (livetarg) or a link (livelink), and rewrite any that are neither as a link. (see our latex commands livelink and livetarg in latex source). Another task at hand might be to take every DW_ and rewrite things like DW\_AT\_foo as DW\-\_AT\-\_foo so that latex can hyphenate. Sometimes we'll want to read a single latex file, sometimes several of them. If we're reading all the files at once (for some reason) we construct an overall List of LFile (LFile mentioned above). TOKENS: This part is idiosyncratic to reflect our other goals. INDIVIDUAL TOKEN characters: { } [ ] space-character tab-character are individual tokens. All 4 forms (tex, underbar, std, label) are identical. The space or tab character is an individual token. All 4 forms (tex, underbar, std, label) are identical. We swallow the input linefeed (or CR-LF) on input, it does not appear in the tokens. IDENTIFIER: The letters in \-_A-Za-z0-9 allowed in an identifier. An identifier begins with one of _ \ letter and has at least one letter in it. Identifers are held in multiple strings in a token tex: (meaning with \_ and possibly \-) underbar: (meaning with \_, no \-) std: (meaning with _ no \, as in the DWARF std) label: (meaning with no _ or \ the form used as part of labels ) OTHER: All other characters are considered letters which are to be reproduced on output. A glob of such are simply considered a non-identifier single token. All 4 forms (tex, underbar, std, label) are identical. Performance: We simply don't care about performance as long as a task takes less than a few minutes. There are only about 70,000 words in a complete document so we ignore efficiency issues. SOURCE FILES: The change to internal use of DWTAGfoo etc in the document instead of DW\_AT\_foo (see dwarfnamecmds.tex) means many of these commands are not as useful as they were originally. But even so they may form useful examples. anylink.py: Looks for designated prefixes like DW_ADDR etc. Used by other code, this was never that useful alone. attrlink.py: Uses anylink.py to turn DW_AT_ into \livelink copyfile.py: Uses fileio.py to parse and output a .tex file. So we can use diff to verify the result is byte-for-byte identical to the input. dellivelink.py: This uses fileio.py and replaces \livelink and \livetarg with \DWXXXyyy per Ron Brender email of Oct 4, 2013. Strives to be idempotent so rerunning produces no further changes. A few cases not handled perfectly (where a } is at end of line?) so if needed again could use a bit of fixing. Files to process must be on command line. fileio.py: Given a list of file (.tex) names, it reads in and tokenizes each file. Functions here let code eventually write stuff back out (changed or not) but the output file always has a ".out" appended, it won't overwrite the input. formlink.py: Using anylink.py, this transforms DW_FORM_ into \livelink and \livetarg. printnameswithinteger.py: Identical to dellivelink.py, but this one has a precanned list of files to process built in. printstandard.py: Print the DW_* entries (and only them) in the files named on the command line one per line. With any \- or \_ removed. use example: python printstandard.py ../latexdoc/*.tex |sort|uniq printtokens.py: Reads and tokenizes files named on the command line . Prints the tokenized data. Solely for debugging the tokenizing. refclassfixup.py: Fixes up certain strings (specified in this file) to use \livelink removehyphen.py: Turns \-\_ into \_ . This was once a needed task. But run long ago, so This is now a useless bit of code. removeutf8.py: The pdf we started from had various utf-8 characters. These got in the way of our text processing, so this app deleted those. This is now a useless bit of code. taglink.py: Finds all the instances of DW_TAG_ in .tex files named on the command line. tohyphen.py: The opposite of removehyphen. This is now a useless bit of code. uses.py: Looks for duplicate uses and definitions of latex tags. This is based on the original approach to naming and linking in .tex, not the latest use using dwarfnamecmds.tex