Scanning

Scanning

A couple of years ago, right before I was born, my mother started making a binder full of pictures and memories. Starting with the year of my birth it goes through my early years. The binder is full of pictures and anecdotes.

And nobody knows where all the negatives are.

Additionally the binder can only be at one location at once so this was actually crying out loud for being moved over into the digital world (without destroying the old one, of course).

I own a document scanner (like this one and have thrown my flatbed scanner away. I’m scanning more documents than images. And the occasional image can be done with this one as well.

I don’t wanna risk to put the whole sheet with the pictures glued to it into the feeder and hope for the best. So I’m removing each and every single one of the pictures from the sheets, scan it and glue it back.

We’re talking here about ~500 446 images (I haven’t counted them yet). This is a lot of repeatable steps.

  • Remove picture,

  • Scan it,

  • Glue picture back to the sheet,

  • Crop scan to fit to the picture,

  • Save picture,

  • Name picture.

I can’t automate removing and putting back the pictures from and to the paper obviously. But I can automate the scanning and cropping. And I’m lazy.

Scanning

This part is the simplest: scanadf (Man) provides all I need to get picture saved. Within a small script that will scan the current picture from the scanner feed and store it with a time-stamp on the file system.

I scan all images in 600dpi, color and change the filename according to the time-stamp of when I was scanning.

#!/bin/bash
TMSTMP=`date +%Y%m%d%H%M%S`
OUTPUTDIR=~/scan
SCANRESOLUTION=600
TMPFILENAME="${TMSTMP}-scan.jpg"
if [ ! -d $OUTPUTDIR ]; then
    mkdir $OUTPUTDIR
fi

# Scan Image
scanadf -v --mode color -o $OUTPUTDIR/${TMPFILENAME} --resolution ${SCANRESOLUTION}

Cropping

the result of the scan

The nice thing problem with a document scanner is, that it always tries to produce A4 documents unless you tell it otherwise.

With one image per scan, that leaves you with a lot of images in A4 format by 600dpi, each one ~100MB in size.

The most area of each sheet is plain white and the scanned image takes only the smallest amount of space in there. Cropping this manually out of every single image drives me insane, I’ve done it three times until I realized that won’t have any future.

Luckily this seems to have been a common problem. So I found the ImageMagick script MultiCrop which has exactly been written to address this kind of problem: It takes scans, searches for separate images in them and stores them away as separate files. Neat.

I had to play a bit with the parameters for fuzziness and the grid, to get the results right, but now it works. I call that script from within the scanning script and added some commands to clean up afterwards.

#!/bin/bash
TMSTMP=`date +%Y%m%d%H%M%S`
OUTPUTDIR=~/scan
SCANRESOLUTION=600
TMPFILENAME="${TMSTMP}-scan.jpg"
if [ ! -d $OUTPUTDIR ]; then
    mkdir $OUTPUTDIR
fi
# Scan Image
scanadf -v --mode color -o $OUTPUTDIR/${TMPFILENAME} --resolution ${SCANRESOLUTION}

#Crop Image out of scan
jt-multicrop -f 20 -g 30 -c 10,10 $OUTPUTDIR/${TMPFILENAME} $OUTPUTDIR/${TMSTMP}.output.jpg

# Cleanup
rm $OUTPUTDIR/tmp.jpg
find ${OUTPUTDIR}/ -maxdepth 1 -iname '*output*.jpg' -size -10k -delete

Storing

The final result are images of about 1MB size as uncompressed JPG in color. I’m not sure if I should drop the color out of those images and store them as plain black and white images. Though there shouldn’t be any color information in them originally, time has done its contribution and colored some of them.

To me it looks like they’re loosing some value when reduced to black and white.

Left/Right, Color/BW

That doesn’t solve my naming problem, like with the right date and the comments next to the picture on the paper, though. But the whole scanning process has been significantly improved.

What I actually do now is only to scan the images and keep the temp file of 100MB.

The process of cropping takes some time which I rather can use to scan the next image. So I multicrop all the images at once afterwards and give it some time which I can spend doing other stuff.