Discuss about the Reverse Engineering Methodologies.
Reverse Engineering is the method of disassembling a particular instrument or material or computer program to extract and analyze the information, and reuse the details for recreation of items. It is used mainly for analyzing the original design involved, re-documentations of old systems, and sometimes in analyzing and removing malicious computer programs. Taking an example if the lead software engineer leaves the job in the middle of the project, it's difficult to operate the same program by a new programmer. Reverse Engineering is needed to analyze and document the programs and handle this type of situation. This process should not be used for getting a pirated version of the program that may defy the originality of the product and this is considered as copyright violation. Therefore certified program should not allow reverse engineering.
Overview of Reverse Engineering
The technique used to retrieve data from machine code is divided into 3 categories. Such as without the help of data browsing, having corporate acquaintance and skill and computer-assisted techniques like reverse engineering (Tilley,2000). If we consider all the category, then reverse engineering process is most effective and most advance way to recover data. A software reverse engineering can handle all the complication of the program environment. It can easily find out the source code from the binary code. Normally, the source code contains high level of information and program logic, whereas binary code is a compiled piece of source code which is only machine readable. It can easily reduce the workload of an engineer as it supplies error free program and helps in code scanning, reading, searching etc. The work procedure for it includes scanning the program and then reconstruct it. Some steps are demonstrated below.
First phase: The product is first analyzed for different modules, process, sub-process, data elements and then the components which will be reverse engineered are identified.
Second phase: This phase involves most of the work. Modules and processes identified in the first phase are decompiled to generate the source code. Each logical component is broken into separate readable programming unit.
Third phase: At this stage the decompiled units are used to recreate the new product or document the original legacy system. The newly created product is then tested for system integrity, logical correctness, and desired output.
Fourth phase: The final stage involves deploying the new product which has inherited features from the parent application.
Some people use reverse engineering in a negative way. They are called hackers(Rich,2004). They decode data for abducting purpose and injecting malicious product. Then a new concept came which is called obfuscation. It is a process to make the program more confusing, willfully ambiguous or harder to understand (Welz,2008). It may be intentional or unintentional obscuring the data. The main intention is to reduce the number of hackers by not allowing them to extract original program. It can protect by different security tools. It can easily control the privacy of a software. The obfuscation of code is a technique which can protect software against attackers and can hide malicious content. Basically, the general aspect of an assembly source code can be obfuscated in many different ways. They are
- String Encryption
It is the most common technique to be used. It can be achieved by renaming each and every identifier within a project. It cannot be used in all modules. So before using it the whole source code must be analyzed and modified accordingly.
Sometimes we use any unprintable character in the machine code, so that identifiers cannot read it. This technique is called string encryption, which embeds the encryption key and the decryption function within the code. Using some tools, we can combine a very large number of techniques to produce much obfuscated code. In recent years, code obfuscation can be done through a very reasonable price which can emphasize software security.
Obfuscating transformations is divided into two broad classes.
- Surface obfuscation Transformation- It can obfuscate the main syntax of a program.
- Structural obfuscation Transformation- It obfuscates the main arrangement of the program in the structural basis. We can either change its control flow diagram or its behavior(Cho,2010).
There are three variant control flow techniques. Such as
- Basic Control Flow Flattening
- Inter- procedural Data Flow
- Artificial Blocks and Pointers
There are number of transformation process require for reversing the code of obfuscation. They are
- Static Path Feasibility- It is used to refer limitation of basic analysis. It checks whether the new prepared path is workable or not.
- Cloning- Most of the obfuscation technique relies on the actual execution path into the program to prevent stationary analysis of the program(Babic,2010). These paths can never be taken at run time, but cause false information to be produced along them during program analyses. That's why it reduces the accuracy of information and making it harder to understand the program logic.
- Both Static and Dynamic Analysis- The actual set of edges is a subset of Static deobfuscation techniques. Dynamic analyses can trace the new program and also prepare different model. Sometimes only one can't do the entire process. So we need to combine both for better result.
Role of Static & Dynamic Analysis
The process which can identify the unknown subject inside a program is called malware analysis. virus, worm, or Trojan horse is some example of it. So it is an essential step to build such technique which can easily identify the malware product. It is important to develop different malware analysis tools to make more effective reverse engineering process. In the past, analysis of virus has been done through an instruction manual procedure which is tiresome and time-intensive. But the number of cases increased, according to time. For all this we need automation in malware analysis.
Although security was done after the software development process, but it was not sufficient. Now security testing is done throughout the entire software development life cycle (SDLC). This life cycle is divided into four separate phases, i.e. Design, Development, Production, Maintenance (Clements,2003). After completing every phase, the product is gone for testing which makes it more secure.
Automated Inspecting Application can be done in two ways.
It is a process to examine a program during run time. A vast number of techniques are used for extraction of data. Dynamic analysis also finalizes the coverage area of code and different dynamic path to follow. Normally, the paths do not consider for testing process which lead bug factor. It works by providing a format of actual programming or integrating the provided code which can originally require during the time of execution(Bus,2004). There are some tools according to run time given below.
Compile-time instrumentation gcov, gprof, dmalloc
Dynamic translation/VM Valgrind, DynamoRIO, Pin
The process which can examine the source code after completion of the program is known as static analysis. There are some process which is used to uncover issues like data-flow analysis, static analysis. Static analysis scan each and every source file and prepare a research chart for analysis and review(Linn,2003). There are some language and their tools given below.
Language and framework Static tools
C++ or C VisualCodeGrepper, Splint
Java™ technology LAPSE+, FindBugs, VisualCodeGrepper
Python PyChecker, pylint
Ruby on Rails Brakeman, codesake_dawn
Automation in Reverse Engineering
Now technology is evolving and automation is frequently used in reverse engineering. Some examples are demonstrated. If we consider Java, then there are different Java compilers. The Java byte-code is completely reversible due to its architecture independence. The most tricky work is decompiling C language. Hex Rays does provide a C Decompiler, but C is a complex language. Context also plays a huge role. Just like what is the difference between pointer or character. The answer is nothing, except the context in which it's used. There are approximately three processes follow to produce a linear regression model of a nonlinear system. Such as automation in probing, partitioning and snipping.
Automation in Probing- In this process the extracted algorithms are first converted into a model, then this passive model is converted into an active model through automation which is basically tested in various different stages.
Partitioning- It is the division process of algorithm in different equations. For partitioning normally, algorithm follows the random optimization process.
Snipping- It is the simplification process during automation. It can easily simplify the algorithm and try to find out errors. It also gives ideas to handle all errors, then redesign a bug free model.
In dynamic analysis, Compiler optimizations might use fastcall rather than stdcall. Due to the dual nature of static and dynamic analysis, it's better to merge them which is an applauded work. In two different ways it can be done. The work process is started with dynamic analysis, which try to prepare a different control flow and edge modeling then static analysis try to short cut the flow diagram(Kazman,2003).
This is the high time to adopt software reverse engineering. If we follow the market, then adoption process is very slow. The effective way is to combine both static and dynamic analysis, which leads to the design recovery of software engineering(Baystate,2011). Every company should focus their program maintenance as well as its protection. It also helps to protect the legacy of code division. Now we have to review back the different phase of program time to time. It can minimize the number of bug factor as well as increase the authentication of the program. That's why some programmer follow the testing cycle repeatedly through the product life cycle. The punishment for the copyright violation should make more dangerous for overcoming all these factor. We should allow more researcher to diversify reverse engineering process. In recent era, Code obfuscation play a vital role for giving error free or virus free product. It can easily minimize the work pressure of a technical. Obfuscating transformations depend on the theoretical intricacy properties of some programs.
Comparetti, P. M., & Wondracek, G. (2009). Prospex: Protocol specification extraction. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy(Vol. 1, pp. 110–125).
Welz, T. (2008). Smart cards as methods for payment: Bochum.
Rich, C., & Waters, R. C. (2004). The Programmer’s Apprentice: ACM Press.
Cho, C. Y., & Babic, D. (2010). Inference and Analysis of Formal Models of Botnet Command and Control Protocols: ACM Conference on Computer and Communications Security.
Cross, J. (2001). Reverse engineering and design recovery: A taxonomy. In IEEE Software(Vol. 7(1), pp. 13–17).
Tilley, S. R. (2000). The Canonical Activities of Reverse Engineering. Netherlands: Baltzer Science Publishers.
Clements, P., & Kazman, R. (2003). Software Architecture in Practice: Addison-Wesley.
Bus, D. Be., & Sutter, B. De. (2004). Link-time optimization of arm binaries: In Proc. 2004 ACM Conf. on Languages, Compilers and Tools for Embedded Systems(Vol. 1, pp. 211-220).
Linn, C., & Debray, S. K. (2003). Obfuscation of executable code to improve resistance to static disassembly: In Proc. 10th. ACM Conference on Computer and Communications Security(Vol. 1, pp. 290–299).
Baystate, v. (2011). Bowers Discussion: Utsystem.