Note: Since the original sasreadme_sf1.txt below was posted by Carol Rogers & John Blodgett on June 15, there have been some additions to the list of files contained in the package (and one deletion: delvals.txt) and all the files for both phases are now stored in the single zip archive "sas_sf1.zip". Information on the changes is in the file "sasrevisionnotes.txt" in the archive. See also notes in Zipdirectory.txt, Main.sas & GenProf.sas -- Roy Williams/MISER, Aug 9, 2001 ----------------------------------------------------------------------------------------------- README.TXT (original June 15, 2001 version) Summary File 1 Profiles - SAS Version PROGRAM DESCRIPTION: These programs were developed by a collaboration of State Data Centers nationwide to produce a set of standard State Data Center program profiles from Census 2000 Summary File 1. The files are made available to all members of the SDC/BIDC Network via the SDC/BIDC Web Clearinghouse (www.sdcbidc.iupui.edu). UPDATES AND NOTIFICATIONS: The SDC/BIDC (SDC-L) List Serve will be the primary means of communicating revisions to these programs. All lead and coordinating members of the network are automatically part of that list serve and affiliate organizations can request access by contacting their lead agency. The SDC/BIDC Web Clearinghouse will be the download site for the programs. Any and all revisions will be noted in the "revision notes" file available on the Profile page devoted to these programs. www.sdcbidc.iupui.edu - click on "Profiles" on the navigation bar. CREDITS: John Blodgett, Missouri SDC, was the SAS Team Leader for this SAS version. He can be contacted with questions at . The other team coders were: Phyllis Smith, Arkansas; Jeff Wallace, Oklahoma; Don Larrick, Ohio; and Roy Williams, Massachusetts. Many thanks to go to these people for providing this product to their colleagues in the SDC/BIDC Network! And of note, John Blodgett created the conversion kit to create the SAS data sets and was the glue for this team; Roy Williams also went above and beyond. Thanks also to Jane Traynham of Maryland and Ed Ratledge of Delaware for testing. The content design team, which brought us the modularized approach that will allow us to select specific parts of the profile appropriate for a given need, included: Ilona Einowski and Rich Lovelady (who passed recently and will be dearly missed) of California; Jane Trayham, Maryland; and Carol Rogers, Indiana and Bob Scardamalia, New York, both serving as co-chairs for the SDC 2k Products subcommittee of the SDC/BIDC Steering Committee. SPECIFICS: The programs are written in several pieces and are designed to be run in a two-step process: 1. Conversion phase: Convert the 40 ascii files from the Bureau to SAS data file. 2. Profile generation phase: Generate profile reports, using the SAS data files output from step 1 as input. The program modules for these two steps are stored in separate .zip archives on this server. The code for the conversion phase is in sas_convertsf1.zip. The files within these are as follows: sas_convertsf1.zip ================== cnvtsf1.sas cnvtsf1.windows.sas ............These are almost identical, alternative, modules. Use the former for Unix, the latter for Windows. Read the extended Overview comments inside. They are a lot more detailed than this readme file. fplace.sas......................This is the source code used to crate the $fplace format code. This is used by the conversion program to change the value of AreaName for sumlev 155 (place-county) so that it will be the name of the place rather than the county. You can avoid using this module if you comment out that code (search for '$fplace.'). You could also edit this file so that it contained only codes for your state. Some new places may not be on this file. phlabs.sas, pctlabs.sas, pct12rlabs.sas..................These are separate SAS modules that get included by the main program to provide variable labels for the three SAS data sets created. Generated by Roy Williams from the data dictionary so they should be pretty accurate. unzip.exe.......................This is a freeware module that can be used in a Windows/DOS environment to unzip all those files from the Bureau and pipe the results into the SAS program. See the discussion and code in the cnvtsf1.windows module. You may have another program (e.g. pkunzip) that will do this as well. But in case you don't, this module will work. sas_sf1pros.zip =============== main.sas........................This is the main program, the one you actually look at, edit and invoke to generate profiles. It references the other modules in this collection. county.sas, sumlev.sas, geocomp.sas.....................These files contain the VALUE statement code used to generate the $county, $sumlev and $geocomp. value label formats used by the main routine. genprofX.sas....................Where X is a digit from 1 to 6. These are the modules that do most of the work involved in generating the profile reports. As with all the other sas source code files they need to be in the same directory as the program that invokes them and they cannot be renamed. As of 6-1-501 we have just 3 of these modules (1,2 and 6). Three more are expected soon. delvals.txt.....................This is a file we generated after converting the Delaware file. It just lists every variable on the generated SAS data sets along with their (very long) labels and the value of that variable on the Delaware data base (i.e. the state summary). A debugging tool. At least it was when we were working with Delaware data. ----------------------------------------------------------------------------------------------------------- You will need to download the two .zip files and unzip each, probably in a new directory created for this project. Directions for running the conversion phase are in the cnvtsf1.sas OR cnvtsf1.windows.sas module. (Use the latter if you running your conversion in a Windows environment; use the former if you are running in a Unix environment. There is very little difference between these 2 modules.) The sf1cnvt module is the only file for the conversion phase which needs to be modified. Directions for running phase 2 are contained in the comments of file main.sas. This (main.sas) is the only file that needs to be modified for this phase. The profile generation phase is handled in a single SAS data step. A filter macro can be coded by the user and gets invoked as part of the step. Thus the input data gets filtered (i.e. subset) as part of the report geneating step and does not need to be stored on disk. Coding the macro is usually just a matter of coding a SAS "where" statement. For example: %macro filter; where sumlev in ('040','050','060'); %mend; This would cause profiles to be generated for the 3 specified geographic summary levels. If you wanted to print profiles for a specific city (such as Montpelier, VT) you could use: where sumlev='160' and areaname=:'Montpelier'; Note the colon (:) after the = . This says to test to see if the name begins with 'Montpelier'. The full areaname would contain the place type (it would be 'Montpelier city') but using the "short compare" operator lets you not have to worry about the exact value. SYSTEM REQUIREMENTS: You must have SAS Version 8.0 or above (testing was with version 8.2 but we did not use anything that was not in 8.0). You will need enough disk space to store all the zipped ascii files for the state(s) to be processed, plus the converted SAS data sets. We estimate that the output SAS files will be on the order of 5 to 6 times the size of the 40 ascii compressed files, but less then 30% of the size of the ascii files decompressed. The SAS data sets specify the SAS "compress" option to save space. The sf1cnvt program reads all 40 of the ascii files in a single SAS data step loop. If you take advantage of SAS's ability to use processes as pseudo files (i.e. "pipes") then you will have 40 simultaneous instances of the unzip utility running during the conversion step. This requires some system resources. In a test on a moderate speed PC (about 250MGHZ) the set of 40 files for Vermont was converted in about 20 minutes. This created over 28,000 observations. If you have a much slower machine or a much larger state, it could take a while to do the conversion. The profile-generating phase requires much less in system resources than the conversion step. If you are able to save the output of the conversion step as permanent SAS data files, then running the profiles step (main.sas) should not require extraordinary resources or take too much time. It will depend, of course, on how many profiles you are generating. NOTES ON RUNNING STEP 1: 1. This conversion is designed to save you from having to unzip any of the 40 ascii files you get from the Census Bureau. This is handled by using a feature know as a "pipe". Instead of reading directly from a file, SAS is instructed to read the output directly from an "unzip " process. So the data inside the .zip files gets decompressed and fed directly to the SAS data step, without ever having to get stored on a disk. This saves both disk storage and processing time. Many shops will already have an "unzip" program available that can be used (e.g. pkunzip) but we have included a freeware unzip.exe module in the packet that can be used if needed. For a more detailed discussion of what the conversion process does see the extensive comments within the sf1cnvt.sas program file. 2. All sas code for the applications should be stored in the same directory. Data files can be stored in a separate directory. By default, the input ascii files and output SAS files go in the same directory but this can be easily changed (in the "libname sf12000 ..." statement.) 3. The conversion process stores geographic header data in a separate data set from the data table variables. The table data is stored in three "regular" SAS data sets, one for the P and H tables, one for the PCT tables (except those in the next set) and one just for the PCT12 tables, where ranges from a to o. Since there is no PCT data for block group and block summary levels, and since these levels can comprise 80-90% of the observations for a state, this results in a very significant savings of storage space. SAS views are used to link the geographic headers data with the table cells. This fact should be invisible to people wanting to access the data. When they code a reference to sf1.vtph they will be getting access to all the variables in sf1.vtgeoes joined with sf1.vtphng. They don't have to know this and they certainly don't have to understand how it works. Similarly, sf1.vtpct and sf1.vtpct12r will provide access to the pct tables along with the geographic headers. To combine these data sets, just use a MERGE statement, BY logrecno (or geo_id), as shown in the main.sas program of the profiles packet. For example: data vermont; merge sf1.vtph sf1.vtpct sf1.vtpct12r; *--any combination of 2 of these will work as well; by logrecno; Note that when you do this, SAS will create missing values for all pct table items for observations at the block and block group level. If you store this merged data set all those empty cells will take up space -- a LOT of space. It may seem a lot simpler to do this so you do not have to worry about using a merge statement, but it will be a signifificant cost in storage and will also take longer for programs to execute (because they have to deal with a lot more physical data). We strongly suggest using the SAS compress data set option if you decide to create other permanent SAS data sets. ------------- cor/jgb June 15, 2001