Scoring functions for breaking cryptopals ciphers

The problem is to find a scoring function.

A function which assigns a number to the likeliness of a message to be english.

What I tried for the score computing function

Most of them based on ‘ETAOIN SHRDLU’ (frequency analysis)

Position based weighting (higher score is better)

      def scorecompute(msg):
	  score = 0
	  positions = 'ETAOIN SHRDLU'

	  for i in positions:
	      weight = len(positions) - positions.find(i)
	      score += msg.upper().count(i) * weight

	  return score

doesn’t work so well

From this point on higher scores are worse.

This one is not based on frequency analysis

Ensure all characters are ascii

      def scorecompute(msg):
	  if check_english:
	      if not re.fullmatch('[A-Z 0-9\n]+', msg.upper()):
		  return float('inf')
	  return 1

Chi-squared based on character frequency

Include non-alphabetic character frequency ( ‘.’,’’’, ‘:’ etc..)

      def scorecompute(msg: str):
	  """
	  Uses chi square test to compute 'score'
	  less score means the message is more likely to be english( match
	  the 'frequency table')
	  """

	  # add space, non-alpha characters usage too
	  freq_table = {"A": 8.55, "K": 0.81, "U": 2.68,
			"B": 1.60, "L": 4.21, "V": 1.06,
			"C": 3.16, "M": 2.53, "W": 1.83,
			"D": 3.87, "N": 7.17, "X": 0.19,
			"E": 12.10, "O": 7.47, "Y": 1.72,
			"F": 2.18, "P": 2.07, "Z": 0.11,
			"G": 2.09, "Q": 0.10,
			"H": 4.96, "R": 6.33,
			"I": 7.33, "S": 6.73,
			"J": 0.22, "T": 8.94, ":": 7.40, "'": 7.40, " ":12.10}
	  score = 0
	  for i in freq_table:
	      observed = (msg.upper().count(i) /len(msg)) * 100
	      expected = freq_table[i]
	      score += ((observed -  expected )**2) / freq_table[i]
	  return score

Chi-square test which penalizes non ascii characters Add a fixed number to the score for every non ascii character found in the message.
```
      for i in msg:
	  if not i.isascii():
	      score +=200
```

Other things I could try

Considering these methods are ‘1-gram’, a method I didn’t try was using ’n-gram’ frequency tables from english.
To make this a proper statistical test a final step would be to find the p-value for a given significance level (say 0.05) and for the degress of freedom (the number of classes in the frequency table ).

The problem is to find a scoring function.#

What I tried for the score computing function#

Other things I could try#

The problem is to find a scoring function.

What I tried for the score computing function

Other things I could try