This is a translator of Java programming language to EOLANG programming language.
- Java 11+ (make sure command
java -version
shows 11+ version of Java in terminal if you have multiple Java version installed). - Gradle 7.4+ to build the project.
- Maven 3.8+ to run tests (be aware of possible conflicts of the latest versions of Maven and Java on some OSs).
- ANTLR4 4.9.2 to build parser.
You can refer to ACCEPTANCE.md file for instructions on installing these components.
In order to build j2eo
transpiler from sources, you need to:
- Clone the repo. You have two options clone it by using
HTTPS
orSSH
, just choose one of them which is suitable for you:- HTTPS:
git clone https://github.com/polystat/j2eo.git
- SSH:
git clone git@github.com:polystat/j2eo.git
- HTTPS:
- Open the project root folder:
cd j2eo
- Build the project:
See the troubleshooting section in case of problems
./build.sh
After build process, j2eo.jar file will appear in the project root
folder (./j2eo
). With this file, is it possible to translate .java files
or packages into .eo packages. In order to translate java
sources
into eo
sources just run the next command:
java -jar j2eo.jar <source of the .java file or the entire directory with Java source files> -o <output directory>
For example the following command will translate SimpleTest.java
file
into output_eo
directory:
java -jar j2eo.jar src/test/resources/SimpleTest.java -o output_eo
You also can translate an entire folder. For example, the following command
wil translate test1
directory into output_eo
directory:
java -jar j2eo.jar src/test/resources/polystat_tests/test1 -o output_eo
You can also use yegor256/j2eo image for Docker:
$ docker run -v $(pwd):/eo yegor256/j2eo hello.java --target output
This command will translate hello.java
in the current directory, saving the
output to output/
subdirectory.
Built-in unit tests may be executed using:
gradle test
J2EO comes with 1000+ bundled tests. There are two testing scenarios:
- Java source code is translated to EO using J2EO project
- Obtained EO code are compared with saved one. If they match — test is passed. If not — test is failed.
All saved EO programs are located in translated_test directory.
This scenario can be executed by the following command:
./gradlew test --tests "common.TestJ2EOStaticCheck"
- original Java source code of the text is compiled with Java compiler and executed. Stdout output is saved.
- Java source code is translated to EO using J2EO project, then compiled with EO compiler and executed. Stdout output is stored.
- Stdout outputs are compared. If they match — test is passed. If not — test is failed.
This scenario may be executed using ./test_candidates.sh
script.
Test suite follows the Java Language Specification structure, covering applicable chapters and sections of the Java specifications.
Hadoop is a large Java project (contains ~1.8M lines of code as of time of writing this). We included it as a benchmark of the translator.
Repository contains a script to build J2EO, download Hadoop repo and run J2EO on it.
Usage:
./test-hadoop.sh
It will download zipped hadoop
and unpack it (in a separate folder)
into ../j2eo-data
relative to the project's root. Next, it will put the If you
no more need that folder, run
rm -rf ../j2eo-data
This project is a part of Polystat project, the goal of which is to statically analyze different languages using EOLANG, the implementation of phi-calculus. In order to do that, the first step is to convert source language into EO. This particular repository contains translator from Java to EO.
Q: Why do we implement yet another Java parser?
A: Publicly available parsers only support older versions of Java, while we aim to support the latest version ( currently 16). Thus, we had to create our own parser.
Also in recent versions, external Java grammar implemented in ANTLR was added as an alternative. It claims to support Java 17, and it does, as for our testing on big projects.
Q: Why do we implement EO AST?
A: Working with AST instead of raw strings allows utilization of Java compiler's type checking to minimize amount of bugs in our code. It is also much easier to work with abstraction layer than with strings.
- First, the Java source code files are parsed recursively.
- Then, for each file, translator converts Java AST to EO AST.
- Then, EO AST is printed out as a source code to output directory in the same directory structure.
Make sure you have these in sync (mentioning (not pointing to) the same jdk
directory)
which java
which javac
- configure alternatives in case of mismatch (link)
echo $JAVA_HOME
.java files:
Translations:
You can find all the .java
tests translated to .eo
files here.
To translate .java
tests into .eo
files manually, you have to perform the
following steps:
- Build the project
- Locate
J2EO-xxx.jar
file in the./build/libs/
folder - Copy this
J2EO-xxx.jar
file into the./src/test/resources/test_candidates/
folder
Run generate_eo_from_tests.py
script in that folder
The script takes some time while performing translation. In the end, you will
get updated translated files in
the ./src/test/resources/translated_tests/
folder.
We use Java language specification document as a foundation for Java feature
hierarchy.
Java 16 language
specification: see .pdf file
Ch. 4 - Types, Values, and Variables
Ch. 5 - Conversions and Contexts
Ch. 7 - Packages and Modules WIP
- Method class member: Java to EO
- Field initialization: Java to EO
- Method declaration: Java to EO
- Inner class: Java to EO
Ch. 14 - Block Statements, and Patterns
- Left-hand operands are evaluated first: Java to EO
- Integer literal: Java to EO
- Complex parenthesized expression: Java to EO
- Creation of a simple integer array: Java to EO
- Postfix increment: Java to EO
- Unary plus operator: Java to EO
- Multiplication operator: Java to EO
- Variable right shift: Java to EO
- Greater operator: Java to EO
- Assignment operator: Java to EO
Ch. 16 - Definite Assignments WIP
Bellow there are all designed mappings at the current moment. If you didn't find a construction in the list bellow it is probably unsupported.
This list is created accordingly Java SE 16. Some chapters are omitted because they related only to internal structure of Java. Others are omitted due to lack of implementation of translation.
Any primitives types are supported. For handling them we use
a primitives package. It
provides memory
wrappers for any primitive types.
Wrappers are more convenient way to simulate primitives types. For,
example memory.write
returns bool
object instead itself, so for handling =
operator we should do something like this:
[a b] > write
seq > @
a.write b
a
It is more complex than just a.write b
, where a
and b
are wrappers.
Moreover, pure memory
does not support in-place operations and conversions.
Therefore, we decided to generate more beautiful EO code and use wrappers
instead generation of unreadable code with pure 'memory'.
The Java programming language provides a number of operators that act on integral values. Supported ones:
- The comparison operators, which result in a value of type boolean:
- The numerical comparison operators <, <=, >, and >=
- The numerical equality operators == and !=
- The unary plus and minus operators + and -
- The multiplicative operators *, /, and %
- The additive operators + and -
- The increment operator ++, both prefix and postfix
- The decrement operator --, both prefix and postfix
- The signed and unsigned shift operators <<, >>, and >>>
- The cast operator, which can convert from an integral value to a value of any specified numeric type
Common translation scheme:
expr_1 op expr_2
-->
[] > binary
expr_1.translated_op > @
expr_2
Unary case:
expr op
// OR
op expr
-->
[] > unary
expr.translated_op > @
Cast case:
(primitive_type)expr
-->
[] > cast
translated_primitive_type.from expr
Currently, there is only runtime support for double
. Nevertheless, translator
can handle float
well, but output EO code would not be equivalent to initial
one during runtime.
The Java programming language provides a number of operators that act on floating-point values. Supported operators:
- The comparison operators, which result in a value of type boolean:
- The numerical comparison operators <, <=, >, and >=
- The numerical equality operators == and !=
- The numerical operators, which result in a value of type float or double:
- The unary plus and minus operators + and -
- The multiplicative operators *, /, and %
- The additive operators + and -
- The increment operator ++, both prefix and postfix
- The decrement operator --, both prefix and postfix
- The cast operator, which can convert from a floating-point value to a value of any specified numeric type
Scheme of translation is the same as in 4.2.2
Currently, only classes as reference types are supported. Identifier of class is
prepended with class__
during translation.
Type variables are omitted due to lack of types in EO.
The same situation as 4.4
The same situation as 4.4
The same situation as 4.4
Any primitive type variable is being stored on special handwritten
objects (primitives
). For example, int
value will be stored in prim__int
object, long
in prim__long
and so on.
Example:
float a;
-->
prim__float.constructor_1 > a
prim__float.new
Any reference type variable is being stored on cage
.
Example:
Ref a;
-->
cage > a
This conversion is omitted by the translator. E.g., (ClassA) class_a_instance
is class_a_instance
in the translator perspective.
19 specific conversions on primitive types are called the widening primitive conversions:
byte
toshort
,int
,long
,float
ordouble
short
toint
,long
,float
ordouble
char
toint
,long
,float
ordouble
int
tolong
,float
ordouble
long
tofloat
ordouble
float
todouble
(runtime support is not precise)
All of them has runtime support.
Example:
(primitive_type)expr
-->
[] > cast
translated_type.from > @
expr
translated_type
is obtained according
to [4.12.1](#4.12.1-Variables of Primitive Type)
22 specific conversions on primitive types are called the narrowing primitive conversions:
short
tobyte
orchar
char
tobyte
orshort
int
tobyte
,short
orchar
long
tobyte
,short
,char
orint
float
tobyte
,short
,char
,int
orlong
double
tobyte
,short
,char
,int
,long
orfloat
(runtime support is not precise)
All of them has runtime support.
Translation scheme is the same as 5.1.2
The same situation as 4.4
Currently, there is no support for this type of conversion. User should manually resolve them. For example:
"1"+1
it should be manually rewritten to:
"1"+String.valueOf(1)
In this case the translator would convert it to:
[] > binary_1
literal_1.add > @
methodInvocation_1
[] > literal_1
class__String.constructor_2
class__String.new
"1"
[] > methodInvocation_1
class__String.valueOf > @
literal_2
[] > literal_2
prim__int.constructor_2
prim__int.new
1
A declaration introduces an entity into a program and includes an identifier. Supported declared entity is one of the following:
- An imported class or interface, declared in a single-type-import declaration or a type-import-on-demand declaration
- An imported static member, declared in a single-static-import declaration or a static-import-on-demand declaration
- A class, declared by a normal class declaration
- A member of a reference type, one of the following:
- A member class
- A field, one of the following:
- A field declared in a class
- The field length, which is implicitly a member of every array type
- A method, one of the following:
- A method (abstract or otherwise) declared in a class
- A formal parameter, one of the following:
- A formal parameter of a method of a class
- A formal parameter of a constructor of a class
- A local variable, one of the following:
- A local variable declared by a local variable declaration statement in a block
- A local class, one of the following:
- A local class declared by a normal class declaration
Any declaration is translated into EO object or EO copy of specific object. Example:
class A {
// body
}
->
[] > class__A
...
body
...
Or,
int a;
-->
prim__int.constructor_1 > a
prim__int.new
A name is used to refer to an entity declared in a program.
A simple name is a single identifier. Each simple identifier preserves name
except classes. Their names are prepended with class__
. There is no name
mangling for variables.
A qualified name consists of a name, a "." token, and an identifier. Qualified names are separated to several objects during translation. Example:
a.b.c;
-->
[] > fieldAcces_1
fieldAcess_2.c > @
[] > fieldAcces_2
simpleRefence_1.b > @
[] > simpleRefence_1
a
Of course, it can be optimized to just one EO object, but at this moment translator does not perform such optimization for keeping translation of dot-separated entities more general.
For now there is now handling of shadowing and obscuring.
EO does not support access modifiers. All objects in an EO is public by default. Therefore, during translation such information is being lost.
Currently, translator supports only single type import declaration and single static support declarations.
Example:
import a.b.c;
import static d.e.f;
-->
+alias a.b.class__c
+alias d.class_e.f
Any identifier in import declaration would be prepended with class__ if it's
known that it is a class.
Identifier java
will be replaced with stdlib
.
Example:
import java.lang.Random;
-->
+alias stdlib.lang.class__Random
Any modifiers except static
are being omitted during translation. static
is
needed to distinguish a inner class from nested one.
Currently, only non-static
fields are supported.
Any method would be translated into EO object. Name in this case would be saved.
Example:
static String m(int p_int,String p_str){
return p_int+p_str;
}
-->
[p_int p_str] > m
seq > @
return_1
[] > return_1
binary_1 > @
[] > binary_1
simpleReference_1.add > @
simpleReference_2
[] > simpleReference_1
p_int > @
[] > simpleReference_2
p_str > @
Any non-static
method will have additional parameter this
that refers callee
itself. It is necessary to implement overriding methods in EO correctly.
Now only static
nested classes are supported. Nested interfaces are
unsupported. Example:
class Outer {
class Inner {
}
}
-->
[] > class__Outer
...
[] > class Inner
...
Only non-multiple construction declarations with explicit super
call are
supported.
Example:
public A(){
super();
//...
}
-->
[this] > constructor
seq > @
initialization
statementExpression_1
...
this
[] > initialization
this.init > @
this
[] > statementExpression_1
this.super.constructor > @
this.super
initialization
is responsible for init of default values.
statementExpression_1
is a super call emulation.
this
is created object itself.
If no constructor is provided then translator generate default constructor.
[] > class__<Name of class>
class__<Parent name> > super # Inheritance simulation
super > @
[] > new # new is representation
# of object itself
class__<Parent name>.new > super
super > @ # Inheritance simulation
"class__<Name of class>" > className # Name of class is being saved
1 > address # Identify that it
# isn't a null object
[this] > init # Initializes class members
... # default values
... # Class methods and variables
... # Static methods and variables
[this] > constructor # Constructor
...
Translator supports both types of arrays: primitive and reference.
Examples: int[]
and String[]
Look at 15.10.1 section.
For access to array elements translator uses get
provided by EO array
object. However, as indexes it uses primitive wrappers.
Example is provided in 15.10.3 section.
An array initializer is written as a comma-separated list of expressions,
enclosed
by braces { and }. Example, { 1 + 1, 2 }
Currently, it is the only way to store something in array. Other types of
initializers (e.g. new Type[num]
) are unsupported.
Example:
{1+1,2}
-->
[] > initializerArray_1
* > @
initializerSimple_1
initializerSimple_2
[] > initializerSimple_1
binary_1 > @
[] > binary_1
literal_1.add > @
literal_2
[] > literal_1
prim__int.constructor_2 > @
prim__int.new
1
[] > literal_2
prim__int.constructor_2 > @
prim__int.new
1
[] > initializerSimple_2
literal_3 > @
[] > literal_3
prim__int.constructor_2 > @
prim__int.new
2
For now only length
attribute is supported. During translation, it remains
unchanged.
By Java grammar, blocks are sequence of declarations and statements separated by curly braces. Translator creates new EO object for each block. Example:
{
declaration;
statement;
}
-->
[] > block_1
seq > @
declaration_1
statement_1
Number after object name is needed just for avoiding name
collisions. declaration_1
and statement_1
are EO objects. They describe
internal structure of itself. Inside a seq
object they are just dataizing.
Now let's look to real Java code:
void foo(){
int a=1;
println(a);
}
->
1 [this] > foo
2 seq > @
3 variableDeclaration_1
4 statementExpression_1
5 prim__int.constructor_1 > a
6 prim__int.new
7 [] > variableDeclaration_1
8 a.write > @
9 initializerSimple_1
10 [] > initializerSimple_1
11 literal_1 > @
12 [] > literal_1
13 prim__int.constructor_2 > @
14 prim__int.new
15 1
16 [] > statementExpression_1
17 this.println > @
18 this
19 simpleReference_1
20 [] > simpleReference_1
21 a > @
Any variables in blocks are declared separately from the seq
object. In this
case int a
was declared at lines 5-6. Also it has an initializer 1
. So
translator assign a
to initializer value at lines 7-9. This initializer is
simple one. It is a just literal. Translator mentioned it on lines 10-11.
Literals are translated to EO objects itself (lines 12-15).
Any statement in blocks are statement expression by default. Their behaviour as
a declarations are described separately. In this case statement println(a)
is
declared on lines 16-19. By default, any method is considered as class method.
So access to it is performed via this
(line 17). Moreover, it is necessary to
pass this
as argument during method invocation (line 18). println(a)
is call
with single argument a
. It is a simple reference that was mentioned at line
- Simple reference is itself a distinct object which translator declared on lines 20-21.
A local variable declaration declares and optionally initializes one or more
local
variables. Translator keeps location of declaration unchanged.
E.g. class member declarations stay inside class body, local method variables
stays inside a method body and e.t.c. Depending on the type declared entity can
be stored in cage
, primitive wrapper or in separate EO object.
Example of class member declaration:
class A {
int member;
}
-->
[] > class__A
...
[] > new
...
prim__int.constructor_1 > member
prim__int.new
...
Example of class member with initializer:
class A {
int member = 0;
}
-->
[] > class__A
...
[] > new
...
[this] > init
seq > @
variableDeclaration_1
[] > variableDeclaration_1
this.member.write > @
initializerSimple_1
[] > initializerSimple_1
literal_1 > @
[] > literal_1
prim__int.constructor_2 > @
prim__int.new
0
prim__int.constructor_1 > member
prim__int.new
...
In this case when instance of class A
would be created, init
object would be
called for initializing variables.
Example of local method variable with initializer:
void m(){
int a=0;
}
-->
[this] > m
seq > @
variableDeclaration_1
prim__int.constructor_1 > a
prim__int.new
[] > variableDeclaration_1
a.write > @
initializerSimple_1
[] > initializerSimple_1
literal_1 > @
[] > literal_1
prim__int.constructor_2 > @
prim__int.new
0
Here the same logic is applicable as in previous example. Variable itself is
being stored out of seq
object, but it dataizes its initialization.
Example with nested class is located in [8.5](#8.5 Member Class and Interface Declarations).
All statements of block are being stored inside seq
object after translation.
Each of them is represented by unique name which during dataization simulates
behaviour of initial statement.
Example:
{
int a=1;
method();
}
-->
[] > block_1
seq > @
variableDeclaration_1 # int a = 1;
statementExpression_1 # method();
prim__int.constructor_1 > a # int a
prim__int.new
[] > variableDeclaration_1 # assignment
a.write > @
initializerSimple_1
[] > initializerSimple_1
literal_1 > @
[] > literal_1 # literal: 1
prim__int.constructor_2 > @
prim__int.new
1
[] > statementExpression_1 # method()
this.method > @ # call
this
Translator ignores it.
Example:
if(cond)
then_block;
else
else_block;
-->
[] > ifThenElse_1
translated_cond.if > @
block_1
block_2
... # translation of cond
[] > block_1
... # then_block translation
[] > block_2
... # else_block translation
If no else part is provided then translator generate empty block (empty
):
[] > empty_1
TRUE > @
Example:
assert cond:expr;
-->
[] > assert_1
translated_cond.if > @
TRUE
[]
translated_expr > msg
... # translation of cond
... # translation of expr
Example:
while(cond)
block;
-->
[] > while_1
translated_cond.while > @
[while_i]
block_1 > @
... # translation of cond
[] > block_1
... # block translation
Example:
do
block;
while(cond);
-->
[] > do_1
translated_cond.do > @
[do_i]
block_1 > @
... # translation of cond
[] > block_1
... # block translation
Note: currently there is no runtime support of do
object.
Now supported only integer, floating point and string literals. Translator use
wrappers to simulate behaviour of Java primitives.
Let's consider an assign operator in Java write value into variable and return
its value. memory
in EO does not provide a such behaviour.
Therefore, we need to use a wrapper.
Examples:
1
->
[] > literal_1
prim__int.constructor_2 > @
prim__int.new
1
1.0
->
[] > literal_1
prim__float.constructor_2 > @
prim__float.new
1.0
"string"
->
[] > literal_1
class__String.constructor_2 > @
class__String.new
"string"
It's remaining unchanged.
(expresion)
->
[] > parenthesized_1
expresion > @
It can be simplified, but we keep such translation to maintain more complex cases.
new A(arg)
->
[] > statementExpression_1
class__A.constructor > @
class__A.new
simpleReference_1
[] > simpleReference_1
arg > @
For referencing variables simpleReference_1
is used. It can be simplified, but
it's used for maintaining complex cases.
Currently, only creation with array initializers is supported. Example of array
initializers: {1, 2, 3}
.
For storing array translator uses cage
object. For simulation of array
initializers translator uses array
aka *
object from EO.
int[] array = {1}
->
[] > variableDeclaration_1
array.write > @
initializerArray_1
[] > initializerArray_1
* > @
initializerSimple_1
[] > initializerSimple_1
literal_1 > @
[] > literal_1
prim__int.constructor_2 > @
prim__int.new
1
array[idx]
->
[] > statementExpression_1
simpleReference_1.get > @
simpleReference_2.v
[] > simpleReference_1
array > @
[] > simpleReference_2
idx > @
simpleReference_2.v
is getting int value from wrapper.
a.b
->
[] > statementExpression_1
simpleReference_1.b > @
[] > simpleReference_1
a > @
It can be simplified, but we keep such translation for generalization.
a.super.b
->
[] > statementExpression_1
simpleReference_1.super.b > @
[] > simpleReference_1
a > @
a.b(arg)
->
[] > statementExpression_1
a.b > @ # call of b
a # a should be passed
simpleReference_2 # args
[] > simpleReference_2
arg > @
expr++
->
[] > statementExpression_1
simpleReference_1.inc_post > @ # increment itself
[] > simpleReference_1
expr > @
expr--
->
[] > statementExpression_1
simpleReference_1.dec_post > @ # decrement itself
[] > simpleReference_1
expr > @
++expr
->
[] > statementExpression_1
simpleReference_1.inc_pre > @ # increment itself
[] > simpleReference_1
expr > @
--expr
->
[] > statementExpression_1
simpleReference_1.dec_pre > @ # decrement itself
[] > simpleReference_1
expr > @
+expr
->
[] > statementExpression_1
simpleReference_1 > @
[] > simpleReference_1
expr > @
It can be simplified, but we keep such translation for generalization.
-expr
->
[] > statementExpression_1
simpleReference_1.neg > @ # negation itself
[] > simpleReference_1
expr > @
(int) 1.0
->
[] > statementExpression_1
prim__int.from > @ # cast itself
literal_1
[] > literal_1
prim__float.constructor_2 > @
prim__float.new
1.0
left operand right
->
[] > statementExpression_1
simpleReference_1.t_operand > @
simpleReference_2
[] > simpleReference_1
left > @
[] > simpleReference_2
right > @
t_opernad
is translated operand
There are only runtime support only for following operands:
+
, -
, *
, %
, /
, &&
, ||
, >
, <
, >=
, <=
, ==
, !=
, <<
, >>
and >>>