Interpreted: A Python interpreter, written in Python
This is an interpreter created as a teaching exercise, explaining how the Python language works internally.
While it works as it is now, it is missing a lot of Python features, and is pretty far from being “feature complete”. Adding these features is an extremely hands-on way to learn how Python, and programming languages in general, are built and designed.
About the mentor
Tushar Sadhwani is a Language Engineer at DeepSource (opens in a new tab).
Tushar is a developer, open source contributor, author and speaker.
Project tasks
Level: Easy
UnboundLocalError not implemented
Issue: https://github.com/tusharsadhwani/interpreted/issues/1 (opens in a new tab)
The interpreter currently doesn't check if a variable is being read before being assigned to in a scope.
It leads to bugs like this:
x = 10
def f():
x = x + 1 # reading the variable from global, but writing in local
print(x)
f()
print(x)$ interpreted asd.py
11
10This should throw UnboundLocalError instead.
Language feature: bytes type
Issue: https://github.com/tusharsadhwani/interpreted/issues/14 (opens in a new tab)
Currently, strings are supported but bytes are not. Adding a bytes type would mean:
- Tokenizing strings with a
bprefix - Implementing operators (
+,*and[]) for bytes.
a = b'abc'
print(a) # b'abc'
print(a[0]) # 97
print(a * 2) # b'abcabc'
print(a + b'd') # b'abcd'Supporting unicode escapes
Issue: https://github.com/tusharsadhwani/interpreted/issues/8 (opens in a new tab)
Currently, using unicode escapes like \u1234 and \U12345678 don't work. They should print ሴ and 🙃 respectively.
print('Hello \U0001F643, this is a unicode character: \u1234')
# Hello 🙃, this is a unicode character: ሴDetecting syntax errors due to return outside a function
Issue: https://github.com/tusharsadhwani/interpreted/issues/11 (opens in a new tab)
This will require ✨Semantic Analysis✨
Essentially, the parsed AST will have to be visited by a semantic analyzer, before it is passed to the interpreter.
This semantic analyzer should do 2 things:
- Detect any presence of
returnstatements outside of a function - Detect any presence of
breakorcontinueoutside of a loop
In both cases, we should raise a SyntaxError.
Language feature: global keyword
Issue: https://github.com/tusharsadhwani/interpreted/issues/12 (opens in a new tab)
Using the global keyword helps define the scope of a specific variable inside a function.
For example:
x = 0
def foo():
x = 1
foo()
print(x) # still 0But, using global:
x = 0
def foo():
global x # now, we know to always get/set x from global scope.
x = 1
foo()
print(x) # 1Comments at the end of file don't work
Issue: https://github.com/tusharsadhwani/interpreted/issues/5 (opens in a new tab)
Currently, if a file ends in a comment, the tokenizer crashes.
print("Hi!)
# this doesn't workLevel: Medium
Language feature: Decorators
Issue: https://github.com/tusharsadhwani/interpreted/issues/2 (opens in a new tab)
Implementing decorators would be pretty simple.
Adding syntax sugar for:
@foo
def function():
...To mean:
def function():
...
function = foo(function)There are some caveats (the variable function is not supposed to be defined when the decorator foo is running), but essentially that is the feature.
Language feature: list, set, dict comprehensions
Issue: https://github.com/tusharsadhwani/interpreted/issues/6 (opens in a new tab)
Current implementation supports lists, sets and dicts, but it doesn't support their comprehensions.
Code like:
my_list = [i*2 for i in range(10)]
my_set = {i*j for i in range(10) for j in range(10)}
my_dict = {i: i*2 for i in range(10) if i % 2 == 0}Language feature: closures
Issue: https://github.com/tusharsadhwani/interpreted/issues/3 (opens in a new tab)
Python supports closures.
Closures are a langauge feature where Python is able to access variables from scopes that are outside the local scope.
For example:
def pattern():
i = 0
def print_stars():
print('*' * i)
while i <= 5:
print_stars()
i += 1
pattern()This outputs the following:
*
**
***
****
*****print_stars() is able to access i from from the local variables defined inside pattern().
This currently doesn't work.
Better stack traces
Issue: https://github.com/tusharsadhwani/interpreted/issues/4 (opens in a new tab)
Currently, a crash leads to a stack trace that contains the interpreter code.
Instead of that, emulating a Python stack, and printing a traceback of that would be quite good.
Language feature: file I/O with open()
Issue: https://github.com/tusharsadhwani/interpreted/issues/10 (opens in a new tab)
The current interpreter can't interact with the file system, but implementing open will solve that.
Support for reading, writing and appending to files will be needed.
file = open('foo.txt')
contents = file.read()
print(contents)
file.close()
file = open('bar.txt', 'w')
chars = file.write(contents)
print("Wrote", chars, "chars")
file.close()Language feature: imports
Issue: https://github.com/tusharsadhwani/interpreted/issues/13 (opens in a new tab)
Imports essentially just run a Python file, while keeping all their variables in a fresh scope.
import foo # should create a `foo` object containing all items in `foo.py`
from foo import bar # should just import the `bar` object from `foo.py`When an import statement is seen, it should:
- Change the
self.globalsdictionary of the interpreter to a new one - Run that file's code
- Store this
self.globalsin an object, and assign that to the imported name. - In case of a
fromimport, just that variable should be assigned to the variable
There is a change that may be required for this to be
fully functional, that is every function may have to hold a reference to
its own global scopre, under __globals__. This should be used to look up variable names, instead of the self.globals which only holds the main module's global state.
But this change is not critical to the feature and can be added separately.
Level: Hard
Language feature: Classes
Issue: https://github.com/tusharsadhwani/interpreted/issues/7 (opens in a new tab)
The interpreter currently doesn't support classes.
Classes have a lot of nuance to them, but this specific issue will focus on three main things:
- Object creation, and handling their closures
- Method calls, and bound methods
- Various dunder methods like
__init__,__add__and__call__.
Just this much will help support a much larger subset of Python.
Naturally, this issue will depend on the addition of closures, via #3 (opens in a new tab).
Language feature: generators and yield keyword
Issue: https://github.com/tusharsadhwani/interpreted/issues/9 (opens in a new tab)
Generator functions are functions that contain the yield keyword.
Generator functions, instead of returning a value, return a generator object, which when called with the top level next() function, resumes the function and keeps running it until it yields a value, and then pauses its execution.
For example:
def generator():
print("This runs first")
yield 1
print("This runs in between")
yield 2
print("This runs last")
gen = generator()
print(next(gen))
print(next(gen))
print(next(gen, 3))Produces this output:
This runs first
1
This runs in between
2
This runs last
3