ANTLR v4: Lexical Scoping

Initially, when writing the TMI language, I was lazy.  I was content to have a single global scope for everything.  However, when I started to program in the language, name collisions soon became cumbersome.

I don’t know if I had made correct choices, but this blog will document my initial direction on how I am addressing Lexical Scoping.

An Example:

i=1000;
s = "Global scope";
foreach i (1..2)
{
  println "I: " + i + " S=" + s;
  {
    s = "Innermost scope";
    println "S: " + s;
  }
}
println "I is now equal to " + i + “ and S=” + s;

If all we have in TMI is one global scope then this program would result in:

I: 1 S=Global scope
S: Innermost scope
I: 2 S=Innermost scope
S: Innermost scope
I is now equal to 2 and S=Innermost scope

which is probably not what we want.

With TMI’s lexical scoping we look first in the local scope, then move upward through the parent scope looking for a variable.  We stop until we find one or when we hit the outermost (global scope), returning null if not found.

With lexical scoping this program yields the more intuitive output of:

I: 1 S=Global scope S: Innermost scope I: 2 S=Global scope S: Innermost scope I is now equal to 1000 and S=Global scope

Variables such as s and i which would have collided in a global scoping model are instead shadowed and forgotten once the we exit the local scope in which they are defined.

This reduces side effects.

Implementation in TMI

Grammar Snippet

The following snippet shows the relationship between a TMI program and a handful of statements.  For this example we’ll focus on the blockStatement.

From a scoping perspective, it is interesting in that all it does is create a local scope and execute the statements within it.

program : statement* EOF; statement : foreachLoop | blockStatement | ifStatement | forLoop | whileLoop | expression (('if'|'unless') expression)? ';'; blockStatement : '{' statement* '}';

TMI Symbol Table

I will not cover all of what this class has going on, but will instead hit the high points.  Our symbol table will know it’s parent.  A local scope will be able to return it’s enclosing parent scope.  The global scope will have none.  The symbol table will be able to return values for objects defined within it given various identifier types.

If we are supplied a raw object, it will simply return that so that our client can call it blindly without type checking to see if the object is an identifier.

The symbol table will allow us to assign values and create child scopes as well as report whether or not a variable is defined locally or anywhere relative to the current scope.

Construction

The underlying structure of the symbol table is a map which is indexed by Strings containing the names of the symbol pointing to an object containing it’s value.

public class TMISymbolTable
{
  private Map<String, Object> sym    = new HashMap<String, Object>();
  private TMISymbolTable      parent = null;

  public TMISymbolTable(Map<String, Object> sym)
  {
    this.sym = sym;
  }

  public TMISymbolTable()
  {
  }

Variable Assignment

TMI currently supports assignment a column identifier, regular identifier or simply as a string/value object pair containing the name and value of the variable to assign.

  public Object assign(TMIColumnIdentifier colId, Object value)
  {
    return sym.put(colId.getColumnAlias().getSource(), value);
  }

  public Object assign(TMIIdentifier id, Object value)
  {
    return sym.put(id.getId(), value);
  }

  public Object assign(String name, Object value)
  {
    return sym.put(name, value);
  }

Variable Existence

We can check whether a variable exists locally via definedLocally or whether the variable exists at all within the local scope through it’s parent scopes in this code.

  public boolean definedLocally(String name)
  {
    return sym.containsKey(name);
  }

  public boolean defined(String name)
  {
    if (sym.containsKey(name))
    {
      return true;
    }
    else if (hasParent())
    {
      return getParent().defined(name);
    }
    else
    {
      return false;
    }
  }

Undefine

Conversely we can remove variables locally or throughout the scope via:

  public Object undefineLocally(String name)
  {
    return sym.remove(name);
  }
  
  public Object undefine(String name)
  {
    Object returnObj = null;
    if (sym.containsKey(name))
    {
      returnObj = sym.remove(name);
    }
    
    if (hasParent())
    {
      Object parentReturnObj = getParent().undefine(name);
      if (returnObj == null)
      {
        returnObj = parentReturnObj;
      }
    }
    
    return returnObj;
  }

Variable Resolution

Lastly, we can resolve the values of the variables in a number of different ways.

Given an object or object key this routine will resolve the variable.  We don’t know, at this time, whether we have been supplied with a variable identifier or raw object.  Rather than have to check every usage within the ANTLR listener or visitor, we handle it centrally here.

  public Object resolve(Object objKey)
  {
    return resolve(objKey, null);
  }

  public Object resolve(Object objKey, Object defaultObj)
  {
    if (objKey == null)
    {
      return defaultObj;
    }
    else if (objKey.getClass() == TMIColumnIdentifier.class)
    {
      return resolve((TMIColumnIdentifier) objKey, defaultObj);
    }
    else if (objKey.getClass() == TMIIdentifier.class)
    {
      return resolve((TMIIdentifier) objKey, defaultObj);
    }
    // Just a plain object.
    else
    {
      return objKey;
    }
  }

resolve with only the object simply delegates to resolve(objKey, defaultObj) using null as the value of the object default value.  This value is returned when the supplied identifier does not exist or is null itself.

resolve(Object, Object) detects for identifiers and returns symbol lookups in these cases, or simply returns the value of the object if the supplied key is not an identifier.

Identifier resolution code follows below:

  public Object resolve(TMIIdentifier id)
  {
    return resolve(id, null);
  }

  public Object resolve(TMIIdentifier id, Object defaultObj)
  {
    Object returnObj = null;
    if (id == null)
    {
      return defaultObj;
    }
    else
    {
      try
      {
        returnObj = resolveString(id.getId(), defaultObj);
        if (id.getAsType() == null || id.getAsType() == "")
        {
          return returnObj;
        }
        else if (id.getAsType() == "String")
        {
          return (String) returnObj;
        }
        else if (id.getAsType() == "Integer")
        {
          return (Integer) returnObj;
        }
        else if (id.getAsType() == "Float")
        {
          return (Float) returnObj;
        }
        else if (id.getAsType() == "Boolean")
        {
          return (Boolean) returnObj;
        }
        else if (id.getAsType() == "Character")
        {
          return (Character) returnObj;
        }
      }
      catch(Exception ex)
      {
        if (returnObj == null)
        {
          return defaultObj;
        }
        else
        {
          return returnObj;
        }
      }
    }
    return returnObj;
  }

This code will resolve the identifier based upon it’s name or id.  Identifiers support preferred types indicated via the getAsType method.  The object will try to convert the symbol table lookup to that value.  If some invalidly typed object is returned, it returns this object anyhow.  If the symbol does not exist, then defaultObj is returned instead.

This delegates to resolveString for resolution.  The code for this is as follows:

  public Object resolveString(String key)
  {
    return resolveString(key, null);
  }

  public Object resolveString(String key, Object defaultObj)
  {
    if (key == null)
    {
      return defaultObj;
    }
    else
    {
      if (sym.containsKey(key))
      {
        return sym.get(key);
      }
      else
      {
        if (hasParent())
        {
          return getParent().resolveString(key, defaultObj);
        }
        else
        {
          return defaultObj;
        }
      }
    }
  }

In the resolveString method, we check the local scope for the variable.  If found, we return it.  If not, we search for a parent.  If found, we recursively call resolveString on the parent scope.  If we do not find a parent, we return the defaultObj instead.

Scope Creation

Scope creation is relatively simple.  We create a new symbol table and set it’s parent to the current symbol table and return it.

  public TMISymbolTable createScope()
  {
    TMISymbolTable scope = new TMISymbolTable();
    scope.setParent(this);
    return scope;
  }

The ANTLR Visitor

The ANTLR visitor code is straightforward.  The BlockStatement simply creates a scope and evaluates the children within this scope.

def sym = new TMISymbolTable();

 

 

public Object visitBlockStatement(BlockStatementContext ctx) { def oldSym = sym sym = oldSym.createScope() def returnObj = visitChildren(ctx) sym = oldSym return returnObj }

This code is in groovy.  All this code does is create a local scope via createScope, evaluate the children then return the visitor scope back to it’s original state and return the visitor result back up the chain.

Conclusion

That’s it for now.  I am still learning, but thought I would share what I think is working pretty nicely for me.  I am always open to new ideas and any suggestions.

Advertisements

About patmartin

I am a coder and Data Visualization/Machine Learning enthusiast.
This entry was posted in ANTLR, Dex, General, Java. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s