Interpreted

Interpreted: A Python interpreter, written in Python

This is an interpreter created as a teaching exercise, explaining how the Python language works internally.

While it works as it is now, it is missing a lot of Python features, and is pretty far from being “feature complete”. Adding these features is an extremely hands-on way to learn how Python, and programming languages in general, are built and designed.

About the mentor

Tushar Sadhwani is a Language Engineer at DeepSource (opens in a new tab).

Tushar is a developer, open source contributor, author and speaker.

Project tasks

Level: Easy

UnboundLocalError not implemented

Issue: https://github.com/tusharsadhwani/interpreted/issues/1 (opens in a new tab)

The interpreter currently doesn't check if a variable is being read before being assigned to in a scope.

It leads to bugs like this:

x = 10
 
def f():
    x = x + 1  # reading the variable from global, but writing in local
    print(x)
 
f()
print(x)
$ interpreted asd.py
11
10

This should throw UnboundLocalError instead.

Language feature: bytes type

Issue: https://github.com/tusharsadhwani/interpreted/issues/14 (opens in a new tab)

Currently, strings are supported but bytes are not. Adding a bytes type would mean:

  • Tokenizing strings with a b prefix
  • Implementing operators (+, * and []) for bytes.
a = b'abc'
print(a)         # b'abc'
print(a[0])      # 97
print(a * 2)     # b'abcabc'
print(a + b'd')  # b'abcd'

Supporting unicode escapes

Issue: https://github.com/tusharsadhwani/interpreted/issues/8 (opens in a new tab)

Currently, using unicode escapes like \u1234 and \U12345678 don't work. They should print and 🙃 respectively.

print('Hello \U0001F643, this is a unicode character: \u1234')
# Hello 🙃, this is a unicode character: ሴ

Detecting syntax errors due to return outside a function

Issue: https://github.com/tusharsadhwani/interpreted/issues/11 (opens in a new tab)

This will require ✨Semantic Analysis✨

Essentially, the parsed AST will have to be visited by a semantic analyzer, before it is passed to the interpreter.

This semantic analyzer should do 2 things:

  • Detect any presence of return statements outside of a function
  • Detect any presence of break or continue outside of a loop

In both cases, we should raise a SyntaxError.

Language feature: global keyword

Issue: https://github.com/tusharsadhwani/interpreted/issues/12 (opens in a new tab)

Using the global keyword helps define the scope of a specific variable inside a function.

For example:

x = 0
 
def foo():
    x = 1
 
foo()
print(x)  # still 0

But, using global:

x = 0
 
def foo():
    global x  # now, we know to always get/set x from global scope.
    x = 1
 
foo()
print(x)  # 1

Comments at the end of file don't work

Issue: https://github.com/tusharsadhwani/interpreted/issues/5 (opens in a new tab)

Currently, if a file ends in a comment, the tokenizer crashes.

print("Hi!)
# this doesn't work

Level: Medium

Language feature: Decorators

Issue: https://github.com/tusharsadhwani/interpreted/issues/2 (opens in a new tab)

Implementing decorators would be pretty simple.

Adding syntax sugar for:

@foo
def function():
   ...

To mean:

def function():
    ...
 
function = foo(function)

There are some caveats (the variable function is not supposed to be defined when the decorator foo is running), but essentially that is the feature.

Language feature: list, set, dict comprehensions

Issue: https://github.com/tusharsadhwani/interpreted/issues/6 (opens in a new tab)

Current implementation supports lists, sets and dicts, but it doesn't support their comprehensions.

Code like:

my_list = [i*2 for i in range(10)]
my_set = {i*j for i in range(10) for j in range(10)}
my_dict = {i: i*2 for i in range(10) if i % 2 == 0}

Language feature: closures

Issue: https://github.com/tusharsadhwani/interpreted/issues/3 (opens in a new tab)

Python supports closures.

Closures are a langauge feature where Python is able to access variables from scopes that are outside the local scope.

For example:

def pattern():
    i = 0
 
    def print_stars():
        print('*' * i)
 
    while i <= 5:
        print_stars()
        i += 1
 
pattern()

This outputs the following:

*
**
***
****
*****

print_stars() is able to access i from from the local variables defined inside pattern().

This currently doesn't work.

Better stack traces

Issue: https://github.com/tusharsadhwani/interpreted/issues/4 (opens in a new tab)

Currently, a crash leads to a stack trace that contains the interpreter code.

Instead of that, emulating a Python stack, and printing a traceback of that would be quite good.

Language feature: file I/O with open()

Issue: https://github.com/tusharsadhwani/interpreted/issues/10 (opens in a new tab)

The current interpreter can't interact with the file system, but implementing open will solve that.

Support for reading, writing and appending to files will be needed.

file = open('foo.txt')
contents = file.read()
print(contents)
file.close()
 
file = open('bar.txt', 'w')
chars = file.write(contents)
print("Wrote", chars, "chars")
file.close()

Language feature: imports

Issue: https://github.com/tusharsadhwani/interpreted/issues/13 (opens in a new tab)

Imports essentially just run a Python file, while keeping all their variables in a fresh scope.

import foo  # should create a `foo` object containing all items in `foo.py`
from foo import bar  # should just import the `bar` object from `foo.py`

When an import statement is seen, it should:

  • Change the self.globals dictionary of the interpreter to a new one
  • Run that file's code
  • Store this self.globals in an object, and assign that to the imported name.
  • In case of a from import, just that variable should be assigned to the variable

There is a change that may be required for this to be fully functional, that is every function may have to hold a reference to its own global scopre, under __globals__. This should be used to look up variable names, instead of the self.globals which only holds the main module's global state.

But this change is not critical to the feature and can be added separately.

Level: Hard

Language feature: Classes

Issue: https://github.com/tusharsadhwani/interpreted/issues/7 (opens in a new tab)

The interpreter currently doesn't support classes.

Classes have a lot of nuance to them, but this specific issue will focus on three main things:

  • Object creation, and handling their closures
  • Method calls, and bound methods
  • Various dunder methods like __init__, __add__ and __call__.

Just this much will help support a much larger subset of Python.

Naturally, this issue will depend on the addition of closures, via #3 (opens in a new tab).

Language feature: generators and yield keyword

Issue: https://github.com/tusharsadhwani/interpreted/issues/9 (opens in a new tab)

Generator functions are functions that contain the yield keyword.

Generator functions, instead of returning a value, return a generator object, which when called with the top level next() function, resumes the function and keeps running it until it yields a value, and then pauses its execution.

For example:

def generator():
    print("This runs first")
    yield 1
    print("This runs in between")
    yield 2
    print("This runs last")
 
gen = generator()
print(next(gen))
print(next(gen))
print(next(gen, 3))

Produces this output:

This runs first
1
This runs in between
2
This runs last
3