Cryptographic hashesΒΆ
Git makes great use of hashes because hashes make excellent unique identifiers for the contents of files.
For background on hashes : Wikipedia on hash functions.
A hash is the result of running a hash function over a block of data. The hash is a fixed length string that is the signature of that exact block of data. Let’s run this in Python:
>>> import hashlib
>>> sha1_hash_function = hashlib.sha1
>>> message = "git is a rude word in UK English"
>>> hash_value = sha1_hash_function(message).hexdigest()
>>> hash_value
'fec41478c4f497c1d90fd28610f4272c78a6867e'
Not too exciting so far. However, the rather magical nature of this string is not yet apparent. Here’s the trick:
There is no practical way for you to find another message
that
will give the same hash_value
The hash_value
then is (very nearly) completely unique to that set of bytes.
For example, a tiny change in the string makes the hash completely different. Here I’ve just added a full stop at the end:
>>> sha1_hash_function("git is a rude word in UK English.").hexdigest()
'9e87add001f13aa79ed7b42a5effbfc60aa8584e'
So, if you give me some data, and I calculate the hash value, and it comes out as “fec41478c4f497c1d90fd28610f4272c78a6867e”, then I can be very sure that the data you gave me was exactly the string “git is a rude word in UK English”.