Simple hash function

Simple Hash Function

Inspiration and Design

The inspiration for my hash function comes from the SHA1 hashing algorithm. Like SHA1, my function uses fixed-size blocks, a predetermined initial block, multiple rounds of processing, and the output from each round as input for the next.

Message Processing

The original message is split into chunks or blocks of 8 characters, which are then converted into their ASCII integer values. To ensure that the last block is the same size as the rest, padding is added if necessary. The padding consists of the \* symbol followed by numbers instead of additional \* symbols. This approach avoids potential collisions if the original message ends with one or more \* symbols.

Initial Block and Security

Simple values are used for the initial block because the hashing function's complexity ensures that these initial values have little bearing on the final message digest. Even if the initial block were all zeros, the algorithm would still be secure.

Hashing Algorithm Mechanics

Within the hashing algorithm, the words are mixed and transformed, but their length remains the same. Each value's influence is maintained throughout the process, as no values are discarded. Instead, they are either moved or operated on. The use of counters, which start at zero and increment with each loop iteration, ensures that every loop is unique. The loop counts (3, 5, 10, and 30) are chosen because they do not divide equally by 4, the number of words used.

Security Features

The hashing function produces fixed-size blocks, so the output length is constant regardless of the input length. Even with similar inputs, the resulting digests are vastly different. Identical inputs always return the same digest, satisfying the requirement for consistency. From the digest alone, it is challenging to determine the input, ensuring pre-image resistance. Extensive testing with symbols, word lists, and number enumeration has found no collisions.

Early in the design, I realized that longer messages are easier to secure than shorter ones. Therefore, the hash function is designed to provide robust security for very short inputs. A key security feature is that the function runs twice on the first block, significantly reducing the likelihood of collisions.

Potential Enhancements

To further improve the security of the hash function, several enhancements could be made:

Increasing the output size or using a larger modulus.
Adding more padding and introducing additional constants within the hashing function.
Performing more rounds of hashing.

For greater use, the function could be modified to generate a hash from different inputs, such as a file or list of files in specified repository.

View the code on GitHub