We had an interesting project pop up here at the Library where I work yesterday. Apparently, part of our inventory process here involves downloading text files with raw barcode data from our barcode scanners, extracting the barcode from amidst the other junk data that pads it in the file, and then loading a freshly formatted list into Millennium, our library’s catalogue software.
I’m not typically involved with inventory or the particulars of maintaining the Millennium catalogue, but I was called in to help with writing some bash scripts to facilitate the process.
The data
Let me begin by showing some sample data from our barcode scanners. The scanners store the barcode in text files, one barcode per line, with some interesting pad characters that I don’t understand and we don’t really want for this project.
Data example A
TXT<font color="red">95053542</font>95012010:12
TXT<font color="red">95053534</font>95012010:12
TXT<font color="red">95053559</font>95012010:12
TXT<font color="red">95053567</font>95012010:12
TXT<font color="red">95053575</font>95012010:12
Data example B
0000030000000000<font color="red">8016039R</font>07051001:56
0000030000000000<font color="red">8110727Q</font>07051001:56
0000030000000000<font color="red">84220078</font>07051001:56
0000030000000000<font color="red">8122772T</font>07051001:56
In both examples here, I’ve made the actual barcode red. The rest of the line is garbage data.
From this data, let me make three observations.
- The characters preceding the barcode may be n characters long.
- Our barcodes are always 8 characters long.
(I already knew this, but needed to make it clear for this post) - The characters following the barcode appear to always be 11 characters long.
The output
Our catalogue software likes to receive barcodes from text files with each barcode on a line prefixed with n: like so…
Output example A
n:<font color="red">95053542</font>
n:<font color="red">95053534</font>
n:<font color="red">95053559</font>
n:<font color="red">95053567</font>
n:<font color="red">95053575</font>
Output example B
n:<font color="red">8016039R</font>
n:<font color="red">8110727Q</font>
n:<font color="red">84220078</font>
n:<font color="red">8122772T</font>
The problem
It seems that folks who do this all the time used to do it by some sort of fiddly method of importing it into Excel and setting a field delimiter at fixed widths to get the barcode into a column by itself, and then outputting everything in the proper format for Millennium somehow. All very tricky, manual, and not much fun…
The solution
The approach we took with this problem was proposed to me by a co-worker (kudos to Bryan Tyson!) who is better versed in Linux and bash scripting than I, but my limited experience with tools like awk and sed really made it seem like one of the easiest solutions to me too.
The following is the bash script we wrote to do the hard work for us. Since I’m not terribly versed in shell scripting with awk and sed, this took a bit of finding, and we tried to comment the script heavily to make it legible in the future. Although awk and sed are powerful, they surely don’t win points for preventing code obfuscation.
#!/bin/bash
if (test $# = 2) then #If file names are entered as input params
INFILE=$1 #store first param as input
OUTFILE=$2 #store second param as output
else #prompt for filenames
echo "******************************************"
echo "Welcome to inventory at J.S. Mack Library!"
echo "This script takes the barcode file from"
echo "the scanner and formats it for the"
echo "Millennium inventory program."
echo "******************************************"
echo ""
#Ask user to enter filename to be processed
echo "What file to process?"
echo "Include the full path if the file is not"
echo "in the same directory as this script."
echo ""
read INFILE
echo ""
echo "What file to save the reformatted results?"
echo "Include the full path if the file is not"
echo "in the same directory as this script."
echo ""
read OUTFILE
fi
#Input files from the scanner may have variable length lines in the following
#format:
# n chars prefix, 8 char barcode, 11 char postfix
# n is set by the scanner's "Major Division" setting
#We want to cut out all but the 8 char barcode.
#First, we must determine the length of the prefix.
#We do this with awk to find the length of every line and subtract the barcode
#and postfix from the total, then pipe to sed to get the prefix found for the
#first line. This assumes that every line in the file is the same length.
#(Can the major division change in the middle of a scanner file? Let's hope not!)
PREFIX=`awk '{print length($0) - 8 - 11}' $INFILE | sed 1q`
#Following calculating the prefix length, we store two values to pass
#to the cut command later based on the prefix.
let BEGIN=$PREFIX+1 #begin on first char after prefix
let END=$PREFIX+8 #end last char of barcode
#The following executes the cut command and pushes output (1 barcode on each line)
#directly to sed.
#Millennium needs "n:" before each barcode. Using sed, we will insert this at
#beginning of each line and output to filename given by the user.
cut -c$BEGIN-$END $INFILE | sed -e 's/^/n:/' > $OUTFILE
echo ""
echo "Your barcode file, ${INFILE}, has been reformatted and saved to ${OUTFILE}."
Conclusion
Perhaps the most unique part of this project was that we needed to be able to run this in Windows, so I had to find out how to run bash shell scripts on a Windows box. We used Cygwin with some success.
Whether this solution is the most efficient is up for grabs. However, it works! If anyone has suggestions for improvements, please comment.

Do all barcode readers generate a text file? The reason I ask is that I would like to buy a cheap reader that I can parse using any programming langauge I wish.
However, most websites or stores do not state that you can generate a text file.
My background is in software development, not really hardware, and especially not barcode scanners. I can’t really speak from experience with different scanners at all, we use the Compsee Apex II at the library where I work. I don’t know anything about how much it costs, how they actually transfer the scan data files from the scanner to the PC, or what the barcode scanner industry standard is really.
The manufacturer produces some software for manipulating the scan data more flexibly, I think, but I don’t think we use it where I work.
Hope that helps. You’ll probably need to look for websites about barcode scanners to definitively answer your question.