Grinding Java - Class files and the VM
Written by Shai Almog
Don't you hate it when someone beats you to it? I had a great idea for this months article: Teaching the structure of the class file and encrypting it so it cannot be decompiled. I finished a great deal of the work and then on Sunday when I entered www.javaworld.com I found out someone wrote almost the same article. Since this is now August and you will not be reading this before November it may seem as if I am simply copying from him so instead I will simply post my code and some explanation on the class file structure and will focus this month on security.
The class file
When I first heard of the fact that Java uses separate files for each class it seemed odd, when I used it a bit I decided it was stupid. Today I consider the class file idea to be brilliant, one of Java's strongest features.
What's so good about class files?
- They are very simple, you can put a class into your database table and have your methods stored with your data. A true OODBMS!
- On my current C++ project which is not too big, VC++ takes 15 minutes to perform a link (without compiling). In Java this is not a problem.
- They contain lots of information which makes exceptions easy to debug even when no source code is provided.
- They are very compact in size yet easy to compress. The perfect Internet file structure.
The references I used to analyse the file structure of the class files were Sun's VM spec and JDK 1.1.1 source code. I recommend everyone who is serious about Java, gets the source code for the JDK from Sun. You only need to sign an agreement and it is very clear and simple code.
The class file has the following structure:
- magic number - 4 bytes
- the magic number is the hex value 0xCAFEBABE which is the id of a class file.
- minor version - 2 bytes
- major version - 2 bytes
- size of the constant pool - 2 bytes
- constant pool- (size of the constant pool - 1)
- The Java class file does not contain any strings or numbers which appear in the file. The reason for that is to reduce size and thus increase speed. Every bit of information needed by the class file is stored in the constant pool and later referenced using a short 2 byte number) to the offset of that information in the constant pool.
- access flags - 2 bytes
- The access flags for the class. This "short" contains flags which specify the access to the class: public/final/abstract, etc.
- class name - 2 bytes
- This is the offset in the constant pool containing the name of this class.
- super class name - 2 bytes
- This is the offset in the constant pool containing the name of this class's super class.
- number of interfaces supported by the class - 2 bytes
- interface list - Number of interfaces supported by the class
- This list is comprised of lists of offsets in the constant pool which represent the names of the interfaces.
- Number of fields in this class - 2 bytes
- list of fields - (Number of fields in this class) * (variable sized structure)
- number of methods in this class - 2 bytes
- list of methods - (Number of methods in this class) * (variable sized structure)
- number of attributes for this class - 2 bytes
- list of attributes - (Number of attributes for this class) * (variable sized structure)
Both the fields and the methods have a similar structure:
- access flags - 2 bytes
- The access flags for the field/method. This "short" contains flags which specify the access to the method/field: public/private/abstract/
- field/method name - 2 bytes
- The offset in the constant pool which contains the name of the method or field.
- field/method descriptor - 2 bytes
- The offset in the constant pool containing the type of the field or method.
- number of attributes - 2 bytes
- list of attributes - Varying size.
Attributes have the following structure:
- attribute name - 2 bytes
- The offset in the constant pool of the attributes name.
- attribute length - 4 bytes
- The length of the attribute data.
- attribute data - attribute length
- The actual data related to the attribute.
This structure is illustrated in Sun's documentation so I will not go into further detail. The full source code and JavaDoc documentation is attached here.
Java covers very well the security concepts for program distribution yet it does not cover the security issues involved in class file decompilation, which allows hackers to modify your code and even steal technology. Using the class file format building a class which scrambles the code so it cannot be decompiled is relatively simple and there are already a couple of products on the market to do just that. There are however excellent Java decompilers as well.
When writing about Java it is hard not to mention security, yet I was able to mostly avoid the subject since we have not developed any applets yet and without them Java's security is not quite as critical.
Java has several types and layers of security:
- The language level security has several language level restrictions which increase Java's security. i.e.: When writing code in Java you cannot access direct memory. Once of the slogans around Java was that Java has no pointers, this is required by the security model of the language so a malicious programmer will not try to access the area of memory where the OS's API is located. Imagine if Java had C++ like pointers you could simply traverse the memory, locate an are of code and replace it with x86 code for erasing the HD.
- The bytecode verifier. When executing Java bytecode a verifier goes over the class file checks it top be valid. This is in fact a source of some of the early Java security bugs: In JDK 1.0 there was a bug where the verifier did not check class casting to be legal. This does not sound too bad but a skilled hacker could write source which would cast a class file of his own making into java.io.File and erase a file on your HD. The compiler would not compile this code but if someone could write the code using Java bytecode, the verifier would accept it. This bug is now fixed.
- The Java SecurityManager allows a class loader to place restrictions on classes it loads. An example of the security manager is called the applet sandbox:
Applets are forced to run under a security manager called the sandbox. The sandbox prevents applets from:
- calling native method.
- Accessing the HD.
- Connecting to other servers.
These restrictions can be lightened when trusted applets are involved, due to the dynamic nature of the security manager. I heard many people question the security of Java vs. ActiveX by saying that applets are useless with the current restrictions and without the restrictions they are just as dangerous as ActiveX components. This is not true since applet access can be fine tuned, you can allow an applet access only to one directory on your HD, which is something ActiveX will never be able to do.
- The java.security.* packages allow us to control security in a fine tuned level.
This package allows us to sign applets and to determine trust worthy code. The SecurityManager is the part which is most easily modifiable and helps us to build powerful environments. The security manager is mostly comprised of methods such as: checkPackageAccess(String) which throws a security exception if the given package name may not be accessed. The security manager has a long list of methods which contain almost every feature in Java and test it for access. The system class contains 2 methods to set and get for the current security manager.
Public/private key security concepts
The public/private key security is already a well known subject yet for those of you who don't know it here is the gist of it: When you have a regular encryption mechanism side A who mails a letter encrypts the data using a key (password) and sends it. Side B who receives the encrypted data decodes it using the same password side a has. The problem: How does side A give the password to side B? This may cause a breach in security.
Well the public/private key scheme solved this whole problem: Side A has two passwords and so does side B. Each side has a password of his own which he tells no one and a password which is common knowledge and was actually derived from the hidden password (private and public keys). When side B or anyone else for that matter wants to send to side A data which only side A will be able to read side B uses side A's public key to encrypt the data. The only one who will be able to decrypt the data is side A! When side A sends data to side B and wants side B to know that side A is the actual sender side A encodes the data using his private key. Anyone who has side A's public key can open the data, but only side a could have signed it!
This allows fully secure transactions between individuals on the Internet. You may perform transactions with people knowing exactly who they are and you may be sure no one is reading your mail. However there are countries who consider encryption technology to be a weapon and it may be illegal to use it there.
The java.security.* packages
Currently Java supports the private/public key paradigm only for signatures JavaSoft will probably add support for encryption in JDK 1.2. The reason for the delay is with US encryption export rules which declare encryption to be a weapon.
The security package's most important classes are the following classes: Signature, Identity, Provider, MessageDigest.
Signature - In order to create a signature algorithm you must subclass this class and provide an implementation for the particular signing algorithm. If you wish to use a signing algorithm you need to call the static method getInstance with the name of the algorithm which returns an instance of the signing algorithm. The signature class contains several methods which must be implemented by the subclass, these methods are called by the standard class API and provide the abstraction necessary. The signature class receive feeds of data and a private key to use for signing and may receive a public key and an encoded stream for verification.
Identity - An identity is a virtual person or organization, it may have Certificates verifying that the identity is legitimate.
Certificate - A certificate is the signature of an entity which verifies that another entity is who it claims to be. If you trust the signing entity you should then be able to trust the certificate owner.
Provider - The provider class is the class you must subclass to provide the algorithm for the encryption or any similar algorithm. This is in fact the engine class for all encryption algorithms.
MessageDigest - The digest algorithm is an algorithm which takes a stream of data of unspecified length and converts it into a block of data of fixed length with the following properties:
- It should. be mathematically. infeasible to find another stream which will generate the same fixed data.
- The fixed data should not reveal anything about the stream which generated it. The digest is used to sign data. Rather than signing the entire data (a time consuming process, which generates large files) the digest of that data is signed and that results in the same effect.
The supplied program is very slow, especially if you use VAJ, this program did not work under VAJ and only worked under the JDK 1.1. I used the parameters DSA and c:\startup.cmd to test it and it.
This program demonstrates a very simple signing and can be altered to fit most security needs and it can be expanded in JDK 1.2's security API.
It's been a shorter column than usual mostly due to the fact that I had another one in mind and had to change the subject in the middle of it's writing. Next month I will deal with distributed objects: CORBA and RMI (mostly RMI).