Getting started
Introduction
JCCD is an API which enables you to build your
own code clone detector. It has a
pipeline architecture consisting of five phases - parsing,
preprocessing, pooling, comparing and filtering. For further
information on the architecture and the implementation of custom
detectors
see the corresponding chapters (coming soon).
In this tutorial we will use a specific implementation of the pipeline with a basic configuration to show you a possible usage of a completed detector - the ASTDetector. During this tutorial we will extend its configuration step by step to find more matches between the two files "TestFileOne" and "TestFileTwo".
In this tutorial we will use a specific implementation of the pipeline with a basic configuration to show you a possible usage of a completed detector - the ASTDetector. During this tutorial we will extend its configuration step by step to find more matches between the two files "TestFileOne" and "TestFileTwo".
|
|
First code clone detection
Our first step will be to match the "factorial"-method
in
line
5
to
11.
This
method
is
identical
in
both
files.
|
|
To use JCCD we first create an "ASTDetector".
The
next
step
is
to
configure
it.
For
this
example
we
just
need
to
define
which
files
to
work
on.
We
create
a
"JCCDFile"-array
with
our
files
and
feed
them
into
our
detector.
Then
we
start
it
by
calling
the
"process"-method.
To
see
some
output
we
transfer
the
result
of
the
"process"-method
to
the
"printSimilarityGroups"-method.
1 APipeline detector = new ASTDetector(); |
The output should be:
Similarity Group 9 ================================================================ test/TestFileTwo.java(5.1−11.1) test/TestFileOne.java(5.1−11.1) ================================================================ |
The similarity group is a unique group of
similarities in which
our match was found. The matches themselves are as described as package/filename(start_linenumber.start_column
-
end_linenumber.end_column)
Match clones with different method and variable names
Now we will match the "gcdOne"-
and
"gcdTwo"-method
in
line
13-22
in
addition
to
our
previous
match.
These
two
methods
are
not
identical.
They
have
different
method
and
variable
names.
|
|
To match them, we must generalize the method
and variable names. For
this we use operators (you can find a list of available operators here). The Op "GeneralizeMethodDeclarationNames"
and "GeneralizeVariableNames"
provide the necessary functionality and are predifined in JCCD.
We
put
them
in
an
"APreprocessor"-array
and
feed
them
into
our
detector.
1 APipeline detector = new ASTDetector(); 9 APipeline.printSimilarityGroups(detector.process()); |
The output should be:
Similarity Group 21 ================================================================ test/TestFileOne.java(5.1−11.1) test/TestFileTwo.java(5.1−11.1) ================================================================ Similarity Group 15 ================================================================ test/TestFileTwo.java(13.1−22.1) test/TestFileOne.java(13.1−22.1) ================================================================ |
In addition to the previous match, the detector
now also matches the the "gcd"-method.
As
we
see,
our
matches
are
now
in
other
similarity
groups
than
before.
That's
because
of
internal
operations.
Matching different number types and missing blocks
Next we want to match the "mul"-method
in
line
24-30.
In
this
case,
the
methods
have
different
types
and
use
different
number
literals.
Additionally,
there
is
no
block
in
the
for-loop
in
the
second
file.
|
|
In order to match these two methods, we must
insert the block in the
for-loop, generalize the types and unify the number literals. To do
this we add the
"CompleteToBlock"-,
"GeneralizeMethodArgumentTypes"-,
"GeneralizeMethodReturnTypes"-,
"GeneralizeVariableDeclarationTypes"-
and
"NumberLiteralToDouble"-operator
to our detector.
01 APipeline detector = new ASTDetector(); 09 detector.addOperator(new CompleteToBlock()); 14 APipeline.printSimilarityGroups(detector.process()); |
The output should be:
Similarity Group 24 ================================================================ test/TestFileTwo.java(3.25−31.0) test/TestFileOne.java(3.25−31.0) ================================================================ |
Because we now match all methods we get only
one match - the bodies of both classes.
Last but not least
The only thing remaining now is to get a match
for the two files. They
differ in the class and file name, so we need to accept different file
names and generalize the class names. As you
might have guessed we need to add some more operators - the "GeneralizeClassDeclarationNames"-
and "AcceptFileNames"-operator.
01 APipeline detector = new ASTDetector(); 13 detector.addOperator(new GeneralizeClassDeclarationNames()); 14 detector.addOperator(new NumberLiteralToDouble()); 15 detector.addOperator(new AcceptFileNames()); |
The output should be:
Similarity Group 25 ================================================================ test/TestFileTwo.java(1.0−31.0) test/TestFileOne.java(1.0−31.0) ================================================================ |