Improving PixelMelt's Kindle Web Deobfuscator

12 points by mtlynch

knl

So someone took pretty accurate method, reintroduced what the original author said was yielding errors, to introduce errors again, and is claiming it’s an improvement?

I liked the original article a lot, but this one is not bringing any insights.

mtlynch
So someone took pretty accurate method, reintroduced what the original author said was yielding errors, to introduce errors again, and is claiming it’s an improvement?

I'm confused by this criticism.
1. PixelMelt discovers a way to deobfuscate Kindle DRM with character-level OCR, but it's not fully accurate.
2. PixelMelt speculates that using page-level OCR would yield higher accuracy but doesn't attempt it.
3. Terence Eden applies page-level OCR and confirms it increases accuracy, giving full credit to PixelMelt for the original work.
This is exactly how science is supposed to work.

What do you feel that Terence Eden did wrong?
edent

Thank you for your constructive criticism on my blog post - I appreciate it.

The original author said:

OCR probably need words and sentences to work well.

Which is what I did. It does produce better results for the majority of the text.

The original frequently confused . with • and , with ' - my method doesn't. Of course, OCR will always have some edge cases.

Nevertheless, I'd be grateful for your insight and expertise into what I could do to improve the OCR process.
- knl
  
  Sorry, I was annoyed at something else, completely unrelated to this article, and let that spill into that comment. I apologize for unnecessary negative criticism.
  
  The part that I missed about the original article is that it did character level OCR as opposed to whole page as you did.
  - edent
    
    No worries - we all have bad days. Hope your week improves :-)