Anabstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.
The syntax is "abstract" in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural or content-related details. For instance, grouping parentheses are implicit in the tree structure, so these do not have to be represented as separate nodes. Likewise, a syntactic construct like an if-condition-then statement may be denoted by means of a single node with three branches.
This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse trees. Parse trees are typically built by a parser during the source code translation and compiling process. Once built, additional information is added to the AST by means of subsequent processing, e.g., contextual analysis.
Abstract syntax trees are data structures widely used in compilers to represent the structure of program code. An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler.
ASTs are needed because of the inherent nature of programming languages and their documentation. Languages are often ambiguous by nature. In order to avoid this ambiguity, programming languages are often specified as a context-free grammar (CFG). However, there are often aspects of programming languages that a CFG can't express, but are part of the language and are documented in its specification. These are details that require a context to determine their validity and behaviour. For example, if a language allows new types to be declared, a CFG cannot predict the names of such types nor the way in which they should be used. Even if a language has a predefined set of types, enforcing proper usage usually requires some context. Another example is duck typing, where the type of an element can change depending on context. Operator overloading is yet another case where correct usage and final function are context-dependent.
Some operations will always require two elements, such as the two terms for addition. However, some language constructs require an arbitrarily large number of children, such as argument lists passed to programs from the command shell. As a result, an AST used to represent code written in such a language has to also be flexible enough to allow for quick addition of an unknown quantity of children.
To support compiler verification it should be possible to unparse an AST into source code form. The source code produced should be sufficiently similar to the original in appearance and identical in execution, upon recompilation.The AST is used intensively during semantic analysis, where the compiler checks for correct usage of the elements of the program and the language. The compiler also generates symbol tables based on the AST during semantic analysis. A complete traversal of the tree allows verification of the correctness of the program.
After verifying correctness, the AST serves as the base for code generation. The AST is often used to generate an intermediate representation (IR), sometimes called an intermediate language, for the code generation.
AST differencing, or for short tree differencing, consists of computing the list of differences between two ASTs.[1] This list of differences is typically called an edit script. The edit script directly refers to the AST of the code. For instance, an edit action may result in the addition of a new AST node representing a function.
The ast module helps Python applications to process trees of the Pythonabstract syntax grammar. The abstract syntax itself might change with eachPython release; this module helps to find out programmatically what the currentgrammar looks like.
An abstract syntax tree can be generated by passing ast.PyCF_ONLY_AST asa flag to the compile() built-in function, or using the parse()helper provided in this module. The result will be a tree of objects whoseclasses all inherit from ast.AST. An abstract syntax tree can becompiled into a Python code object using the built-in compile() function.
If these attributes are marked as optional in the grammar (using aquestion mark), the value might be None. If the attributes can havezero-or-more values (marked with an asterisk), the values are representedas Python lists. All possible attributes must be present and have validvalues when compiling an AST with compile().
Instances of ast.expr and ast.stmt subclasses havelineno, col_offset, end_lineno, andend_col_offset attributes. The lineno and end_linenoare the first and last line numbers of source text span (1-indexed so thefirst line is line 1) and the col_offset and end_col_offsetare the corresponding UTF-8 byte offsets of the first and last tokens thatgenerated the node. The UTF-8 offset is recorded because the parser usesUTF-8 internally.
Note that the end positions are not required by the compiler and aretherefore optional. The end offset is after the last symbol, for exampleone can get the source segment of a one-line expression node usingsource_line[node.col_offset : node.end_col_offset].
Deprecated since version 3.8: Old classes ast.Num, ast.Str, ast.Bytes,ast.NameConstant and ast.Ellipsis are still available,but they will be removed in future Python releases. In the meantime,instantiating them will return an instance of a different class.
Deprecated since version 3.9: Old classes ast.Index and ast.ExtSlice are stillavailable, but they will be removed in future Python releases.In the meantime, instantiating them will return an instance ofa different class.
A constant value. The value attribute of the Constant literal contains thePython object it represents. The values represented can be simple typessuch as a number, string or None, but also immutable container types(tuples and frozensets) if all of their elements are constant.
When an expression, such as a function call, appears as a statement by itselfwith its return value not used or stored, it is wrapped in this container.value holds one of the other nodes in this section, a Constant, aName, a Lambda, a Yield or YieldFrom node.
A named expression. This AST node is produced by the assignment expressionsoperator (also known as the walrus operator). As opposed to the Assignnode in which the first argument can be multiple nodes, in this case bothtarget and value must be single nodes.
A subscript, such as l[1]. value is the subscripted object(usually sequence or mapping). slice is an index, slice or key.It can be a Tuple and contain a Slice.ctx is Load, Store or Delaccording to the action performed with the subscript.
One for clause in a comprehension. target is the reference to use foreach element - typically a Name or Tuple node. iteris the object to iterate over. ifs is a list of test expressions: eachfor clause can have multiple ifs.
An assignment with a type annotation. target is a single node and canbe a Name, an Attribute or a Subscript.annotation is the annotation, such as a Constant or Namenode. value is a single optional node.
A for loop. target holds the variable(s) the loop assigns to, as asingle Name, Tuple, List, Attribute orSubscript node. iter holds the item to be looped over, againas a single node. body and orelse contain lists of nodes to execute.Those in orelse are executed if the loop finishes normally, rather thanvia a break statement.
A single case pattern in a match statement. pattern contains thematch pattern that the subject will be matched against. Note that theAST nodes produced for patterns differ from those produced forexpressions, even when they share the same syntax.
A match literal or value pattern that compares by equality. value isan expression node. Permitted value nodes are restricted as described inthe match statement documentation. This pattern succeeds if the matchsubject is equal to the evaluated value.
A match sequence pattern. patterns contains the patterns to be matchedagainst the subject elements if the subject is a sequence. Matches a variablelength sequence if one of the subpatterns is a MatchStar node, otherwisematches a fixed length sequence.
Matches the rest of the sequence in a variable length match sequence pattern.If name is not None, a list containing the remaining sequenceelements is bound to that name if the overall sequence pattern is successful.
A match mapping pattern. keys is a sequence of expression nodes.patterns is a corresponding sequence of pattern nodes. rest is anoptional name that can be specified to capture the remaining mapping elements.Permitted key expressions are restricted as described in the match statementdocumentation.
This pattern succeeds if the subject is a mapping, all evaluated keyexpressions are present in the mapping, and the value corresponding to eachkey matches the corresponding subpattern. If rest is not None, a dictcontaining the remaining mapping elements is bound to that name if the overallmapping pattern is successful.
A match class pattern. cls is an expression giving the nominal class tobe matched. patterns is a sequence of pattern nodes to be matched againstthe class defined sequence of pattern matching attributes. kwd_attrs is asequence of additional attributes to be matched (specified as keyword argumentsin the class pattern), kwd_patterns are the corresponding patterns(specified as keyword values in the class pattern).
This pattern succeeds if the subject is an instance of the nominated class,all positional patterns match the corresponding class-defined attributes, andany specified keyword attributes match their corresponding pattern.
Note: classes may define a property that returns self in order to match apattern node against the instance being matched. Several builtin types arealso matched that way, as described in the match statement documentation.
3a8082e126