Every tax is a pay cut.  Every tax cut is a pay raise.
Citizens for Limited Taxation

Commonwealth of Massachusetts Expenditures
Source of Financial information
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

How I did this life-sucking project on state finances
I can now tell you that while it does not require knowledge of rocket science or brain surgery, but it does require a Masters degree in Computer Science to break through the sheer obfuscation in the state budgets for the Commonwealth of Massachusetts. Coincidentally, I have one of those.

I started this project back around December of 2007 and finished the bulk of it by May of 2008.

The primary source of financial information is obviously the Massachusetts State Legislature home page where the budgets are automagically created from all those wheel barrows of money we send in on April 15th each year. Majority house leader DiMasi is the alleged brains behind the outfit.

I converted each of these budget HTML files into a text file and wrote a couple of Perl programs (sec.pl) to extract the data and (plugin.pl) to merge the data and create an ASCII file delineated by "|" called mass-expenditures.  A sample entry follows:

0320-0001|Supreme Court: chief justices|897209|934978|952518|912413|||||||

The first number is a state supplied expenditure code and I generated the second entry by reading the more lengthy text supplied in the actual budget. The next 10 numbers were the values found for each year.  After plowing through ten years of state budgets, there were over 2,600 lines of different and unique expenditures each year.

After generating the data for ten years, I used a simple shell script make-html.sh to sort the data and generate the HTML for ten years of detailed expenditures. The Perl program genhtml.pl was then used to generate the HTML for any given year.

The bulk of this data extraction was done thru several Perl programs I wrote that scanned the web page, extracted the relevant numbers and merged them into the resulting mass-expenditures file. The hard part was the stare and compare phase where I had to cross check my values against the web pages. This sucked untold hours of my miserable life stream.

I then sorted this file by the expenditure code and removed the expenditure values (leaving the expense code and descriptor) to create a file called mass-expense-codes where I remapped the 2,600 expenditures and reduced them to about 80 expenditures. The following entries:

# District Attorney
0340-0100|Suffolk District Attorney
# State Police
0340-0101|Suffolk District Attorney: overtime state police
# District Attorney
0340-0114|Suffolk District Attorney: Project Sentry
0340-0200|Middlesex District Attorney
# State Police
0340-0201|Middlesex District Attorney: state police overtime

would generate the following expense code mappings:

0340-0100 District Attorney
0340-0101 State Police
0340-0114 District Attorney
0340-0200 District Attorney
0340-0201 State Police
in the file specific-to-general-mapping.

The 80 or so expenditure codes were selected mostly by sheer number of entries in that category. Obviously, there were lots of Health and Education entries.

Another Perl program mec.pl would then read the expenditures values and the expenditure category remapping file, perform the remapping operation, add up all the assorted values for each year and cough up an ASCII summary file general-commonwealth-expenses.

I than manually arranged the entries in this file by general categories.  For instance, I wanted to clump all the entries having to do with criminal justice together because their seem to be so many of them.

Then, gen_mec_html.pl was used to generate the approrpiate HTML which is the final tally of state expenditures.

A recap of files and programs:

  • sec.pl

    This would extract the expenditure codes and values from the free-form text files generated from the state budget HTML files.

  • plugin.pl

    This would basically merge the values for each year so I could could accumulate multiple years of data in the final ASCII data file called mass-expenditures

  • mass-expenditures

    The accumualted data for state expenditures

  • make-html.sh

    This Bourne shell script would simple sort the expenditures and invoke genhtml.pl to generate the HTML for each year.

  • genhtml.pl

    Generates the HTML for a specific sorted year.

  • mass-expense-codes

    This file contains the expense codes and a one line descriptor of a state expenditure. By inserting appropriate comment lines in this file, I could remap the specific expenditures to general categories.

  • mec.pl

    This Perl program reads mass-expense-codes and mass-expenditures to do the summaries of state expenditures into 80 or so categories.

  • specific-to-general-mapping

    This file is a temporary remapping file that was generated by mec.pl.

  • general-commonwealth-expenses

    This is an ASCII file containing the summary of general state expenditures (80 or so categories)

  • gen_mec_html.pl - generates final HTML on expenditures

    This generates the final HTML for general state expenditures after I manually organized the general-commonwealth-expenses file.

Send comments to: hjw2001@gmail.com