NOTES ON THE USE OF EXTRACT WITH SELECTED CENSUS BUREAU CD-ROMS =>1992 and 1987 Economic Census Discs, Volumes 1 and 2 Auxiliary files needed by EXTRACT are on each disc. Each successive disc within each volume supersedes the previous disc. For example, 1987 CD-ROM 1e includes all files from Discs 1a, 1b, 1c, and 1d plus many more; Disc 1e also corrects a few minor errors in Disc 1d. Most data files contain no text labels, requiring the frequent use of the "add labels" function. Most data dictionaries and label files have definitions available that can be displayed from the main data display screen or during item or record selection. Some of the Manufactures Industry Series files have separate footnote files for footnotes that are specific to particular data cells. To display them in a manner that actually mimics footnotes, select the footnote files with the "Add labels" command, use the idth option in the Display or Preview mode to limit them to a single character in width, then press S to how the entire footnote when the one-character column is anything other than blank. Many of the auxiliary files on the Volume 1 discs are useful data files in their own right. For instance, the GEOREF92.dbf (92 discs) or STCOUNTY.dbf (87 discs) shows not only area names and codes, but also population estimates for recent economic census years--1992, 1987, 1982, and 1977. Record selection Many of the files in Volume 1 feature data for multiple levels of geography, e.g., state, metropolitan areas, counties and places. Selecting records for a particular county will bring up all records with that county code, which includes places within that county as well as the county total. Thus, to bring up only county total records for a particular county, select on both GEOTYPE, checking off code "3" for counties, and COUNTY. To view data for all counties, select only on GEOTYPE. After specifying record selection, EXTRACT may prompt you to enter a value within one or more additional variables in order to speed the search. Most commonly the request is for a single value for record type, since many of the indexes which can speed the search involve record type. When selecting records for a particular county within the RC87A3, WC87A3, and SC87A3 files on the 1987 CD-ROMs, you will be prompted to select single values within record type, state, and place--due to a quirk in the structure of the available index. Since you probably do not want to select only a single record type or place, press to when prompted to check off one of their values. =>1987 Census of Agriculture--State and County Data Auxiliary files needed by EXTRACT are on this disc. On the other hand, the 1987 Agriculture Specialty Publications CD-ROM consists primarily of files in Lotus 1-2-3(tm) and flat ASCII formats, and is not accessible with EXTRACT. The "definition" function is not available on the Agriculture State/county disc, since narrative definitions were not built into the dictionary and label files. Note that the ZIP code data from the census of agriculture are available on Economic Census CD-ROM 2b, along with ZIP code data from the censuses of manufactures, retail trade, and service industries. Adding labels There are no text labels in the county files. To describe a particular data record in these files you need to identify both the county and the item number. EXTRACT gives you the capability to add up to two sets of labels at a time, e.g., both county labels and item labels, but that will leave you without enough space for data in the columnar display unless text columns are truncated with the idth function. Two sets of labels also slows down all data retrieval operations. The most common inquiries, though, look for a number of items for a given county or all counties for a given item. As long as one of the selection criteria has only one value, the description of that value will appear on screen displays and printouts as a second-level heading. Thus, you need only one set of labels to describe the data. Selecting records Because of the structure of the data files, selecting data items, like the number of acres of wheat harvested, is a matter of selecting records (not selecting "items", as with economic census files). This process is made more difficult because item codes in both the county files and the state files are unrelated to any coding scheme with which users may already be familiar (such as SIC codes or commodity codes). The best aid to finding the data items you want is to find them first in one of the 1987 Census of Agriculture geographic area series reports, then note their table number and relative position within the table. The first two digits of each item code in state and county files are the table number. (Otherwise the item codes in state and county files are unrelated.) Record selection in county files. For the county files, selecting records is aided by a menu of all items available. The ump function allows you to specify a particular table number, e.g., 18, whereupon the menu will jump to all items with a 5-digit code beginning with 18. From that point you browse through as many screens as are necessary to find, and select with X's, all of the items you want. Note that there is a maximum of 17 items that can be checked off at any one time. If you need more, you will need to revert to selection in terms of a range of codes. If you don't know the table number, the ord search and ocate options can come in handy. ord search prompts for any string of characters, and, after a few minutes, the system will present you with a list of all of the labels in which that string of characters appears. The system will find the desired string of characters anywhere on a line, upper or lower case, but it may also pick up that string as part of another word. For example, searching for Apples will also bring up lines for Pineapples. Select with X each of the codes you want. Somewhat faster in operation is the ocate option, which will also accept a string of characters, but will look for them only at the beginning of a line. ocate is faster than ord search because ocate stops looking once it has found the first occurrence and displays the full list starting at that point. In any record selection which involves either the item number or the county code, the record selection process will automatically use that state's index file to find the first eligible record. That index is sequenced item code by county, so that, until that index is turned off, all displays will be sequenced item by county; for example, the count of farms would be shown for all counties, then land area in acres for all counties, and average acres per farm for all counties. If you want those three items together for each county, you can type "N" for natural sequence when the data are displayed to screen sorted by item, or you can turn off the index before going in to the data display screen, as discussed below. Selecting multiple items for multiple (or all) counties can take a great deal of time. The preferred way to query the file in EXTRACT is either to select a single item to display for multiple counties, or to select multiple items to display for a single county (turning off the index before displaying). If you want to display a number of counties for two or three items or a number of items for two or three counties, it is much faster to structure them as two or three separate queries, perhaps printing out the results for future reference, than to do it all at once. To turn off the item-by-county index prior to data display, select Manipulate Files (option 4) from the main menu, then Select an Existing Index (option 1) from the Manipulate Files menu, and finally press to escape from the index selection screen. Record selection in the state file. For the state file, selecting records is made more awkward because there is no separate reference list of data items that can be brought up for record selection. Thus, selecting based on the ITEM_ST code can be done only by specifying a range. To look at all codes corresponding to table 18, enter 18 as the minimum value and 19 as the maximum value. Once you have displayed the data to the screen and scrolled through the many records to find the data you want, make note of the applicable ITEM_ST codes for future reference. Another factor affecting the selection of data from the state file is the fact that each data item is defined by not one but two 60-character labels. In general, TEXT1 defines the row stub in the corresponding printed table while TEXT2 defines what would have been the column header. There are two ways to see both sets of labels at once. Switch to rowwise display, which will show only one record at a time but will give both full 60-character labels. Use the "W" option in the normal columnar display to specify a narrower column idth. Simply highlight any value in the column to be narrowed, type "W", and the system will prompt for the number of characters of the column to display. By truncating the text1 and text2 fields to fewer than 34 characters each, both can be displayed on the screen at the same time along with at least one data value. Note that if you really need to see the last part of a text field you have truncated, merely highlight the desired entry and press "S" for how. That will display the entire text across subsequent lines, until you move the cursor. =>U.S. Exports and Imports of Merchandise The CD-ROMs in these two monthly series do not include EXTRACT-compatible auxiliary files, which must instead be obtained in compressed form on Economic Census CD-ROMs 2B or 1E, on diskette from the Center for Electronic Data Analysis (see address under Getting Assistance, above). Two levels of installation are available, a compact installation occupying 130 kb on your hard disk, and a preferred installation, with somewhat more legible and useful commodity menus, requiring 3.4 megabytes more. While the master catalog includes both export and import data, no one CD-ROM has both, and you must insert the appropriate disc for the query you make. Each import and export database already includes text labels, or at least mnemonics, for the commodity, country, and district, as applicable. The "add labels" function is still useful, particularly in attaching alternate codes to each commodity. (SIC, SITC, End Use and Agriculture codes are available as labels.) Each disk contains summary files of moderate size, and an enormous detailed data set of 200 to 500 megabytes. Each database is indexed, but data retrievals from the largest data sets can be fairly time consuming. Record selection involving a single value (e.g., imports from Iraq or exports of a particular commodity) can be accomplished reasonably quickly. Record selection based on widely separated codes (e.g., imports from Venezuela (code 3070) and Nigeria (code 7530) can be extraordinarily time consuming (hours!), but if the requests are made separately, one code value at a time, they can be accomplished in a few minutes. =>County Business Patterns The 1986-87 and 1987-88 CD-ROMs do not include EXTRACT-compatible auxiliary files, and the 1988-89 CD-ROM is missing one critical auxiliary file. Fortunately, the 1988-89 CD includes auxiliary files in compressed form (\EXTRACT\CBPAUXIL.EXE) which, once installed on a hard disk (about 930 kb), can service all three of the CDs issued as of this writing. (A substitute master catalog for the 1988-89 disk can be obtained from the BBS which requires less hard disk space if earlier CDs are not being used.) Narrative definitions are available for key data items, but descriptive informa- tion for SIC categories are limited to the 56-character title. Note that the SICs recognized for the 1986 and 1987 CBP are on the 1972/77 basis. About 1/4 of all SICs were redefined for the 1987 Economic Censuses. The 1988 and 1989 CBP data are reported on the new basis, consistent with the 1987 Economic Censuses. =>County and City Data Book, 1988 EXTRACT-compatible auxiliary files may be obtained in compressed form on Economic Census CD-ROMs 2b or 1e, on the Census Bureau electronic bulletin board, or on a diskette from the Center for Electronic Data Analysis. When installed on your hard disk, the auxiliary files require about 635 kilobytes. Each file in this data set includes data for all states, all counties or all cities nationwide, but only a limited number of items are included in each file. Thus, for example, age statistics might be in one file while race statistics are in another. This is quite unlike the structure of economic and agriculture census files which more often segment the data by state rather than by subject matter. Within the places file PLF01, the system cannot successfully select records based on county (the STCO variable), due to a quirk in index structure. It is, however, possible to select on state code. =>USA Counties, 1992 EXTRACT-compatible auxiliary files may be obtained in compressed form from the census Bureau electronic bulletin board, or on a diskette from the Center for Electronic Data Analysis. When installed on your hard disk, the auxiliary files require about 1.1 megabytes. Each file in the data set includes data for all states and counties nationwide. The 2080 statistics and their flags are segmented across 35 separate files, and you may wish to consult the master list of data items in the documentation or in the file CO_DDF.dbf (in the AUXIL catalog) to determine which items are in which files. The Add Labels function is an easy way to link into a single display items from two or three files. The COSTATPF.dbf data base consolidates 51 of the most frequently used items into a single source, and, in fact, is the only file that contains the 1990 census population count. The variables most often used for record selection are ST, to select the state total and all counties in a state; METRO, to select all counties in a metro area (there are not MSA summaries); and SUMLEV, to select all state totals. When selecting a specific county or group of counties, select on both STate and COUnty. Initially the county menu will start with Alabama counties, even if another state has been selected, but you may ump to the appropriate state if you recall the appropriate state code from the previous step. You may select up to 14 individual counties at this point, but they must all be in the same state. If you need counties from multiple states, select all counties within those states with the ST code. =>1990 Census CD-ROMs: STFs 1A, 1B, 1C, 3A, 3B, and 3C; PL94-171; EEO Files The auxiliary files that EXTRACT needs to work with each of these 1990 Census products are included in compressed form on 1987 Economic Census CD-ROM 1e and on 1992 CDs or may be downloaded from the Census bulletin board. Auxiliary for STF 1D, STF 3D, and CHAS will be available by downloading only. Comparison of EXTRACT and GO If you simply want to look up numbers, one area at a time, the GO software distributed on each of the 1990 census CD-ROMs is much easier and more efficient than EXTRACT. The program prompts you to pick a summary level, then a specific area, and finally a particular table. One of the tables is a "general profile" presenting 89 key items selected or derived from the 982 population and housing data fields carried in the 10 STF 1A databases. If you want to see the more detailed data, you may look at any one table at a time. That table, or the general profile, may be printed to your printer or to a file, one area at a time. Because this GO software was created specifically to work with the STF 1A discs, it insulates the user from much of the complexity of the STF 1A files--data stored in multiple files, with cryptic variable names, and with complex geo- graphic hierarchies. As such, GO is highly appropriate for novice users. EXTRACT, on the other hand, requires you to deal with more of the complexity of the data set, but gives you much more power and flexibility in working with the data, including the following capabilities: - To deal effectively with classes of geographic areas, such as all places within a county or all tracts with 1 or more person of Hispanic origin. - To work with data from more than one table at a time. - To view narrative concept definitions so you understand what you are working with. - To create "user-defined items", such as population per square mile, percent white, or a count of persons under 15. - To merge these data with data from other sources, such as the economic censuses. - To copy to a file whatever subset of data you select, so that you can import the data into other applications, such as Lotus 1-2-3(tm) or Harvard Graphics(tm). - To work these data using a tool you have learned how to use with other Census Bureau data sets. Dealing with the 10 STF 1 files and 34 STF 3 files Since there are over a thousand geographic, population and housing fields in STF 1, and no dBase file can contain more than 128 fields, STF 1A and STF 1C are structured as a series of ten parallel files for each State. Even larger, STF 3 files consist of 34 parallel files. When working with STF 1A, EXTRACT prompts you to select one of the ten as your starting point in the "Select a Catalog" screen: File Matrixes Subjects covered STF1A0 P1 - P10 Geographic Identifiers, Sex, Race, Hispanic Origin STF1A1 P11 - P12(p2) Age, Total and White STF1A2 P12(p3-p5) Age, Black and American Indian, Eskimo, Aleut (male) STF1A3 P12(p6-p8) Age, American Indian .(female) and Asian or Pacific STF1A4 P12(p9)-P13(p1) Age, Other Race and Hispanic Origin (male) STF1A5 P13(p2) - P19 Age, Hispanic Origin (female), Household Rel. & Type STF1A6 P20 - P35 Household Relationship & Type (cont.), Imputations STF1A7 P36, H1 - H20 Housing: Vacancy, Tenure, Age of Hhr, Rooms, Persons STF1A8 H21 - H40 Persons Per Room, Value, Rent, Duration of Vacancy STF1A9 H41 - H55 Units in Structure, Housing Imputations Since this is rather limited as a subject locator, you will want to keep handy a copy of the subject locator (pages 3-1 to 3-6) or the table outlines (pages 5-1 to 5-11) in the printed STF 1 or STF 3 CD-ROM documentation, or to print out the corresponding ASCII text files SUB_LOC.ASC and TBL_OUT.ASC in the \DOCUMENT subdirectory of the CD-ROM. If you expect to return to the file selection menu several times during a particular session, it will be worthwhile to type R to estrict the session to data for a particular state before selecting the catalog. You can combine into your display data from one or two more of these subject areas using "Add Labels" discussed below. Displaying data to the screen From the main menu, which you reach after specifying drives and selecting the file you want, your first step is likely to be option 6--Display to screen. If you selected STF1A0 or STF300, you are confronted with columns of codes and no data. The data are there, but you must cursor to the right past more than 60 codes to find them. Thus, it is helpful to select only certain columns to display, which is accomplished through option 1--Select Items--at the main menu. Selecting items When using a STF1A0 file, the Select Items menu extends through 11 screens (move down with ). The first four screens are all codes, mostly in alphabetic, not logical or hierarchical order. Area name (ANPSADPI) appears on the fifth screen, 10 lines before the first of the population statistics. The first of the subject items, P0010001, is on the sixth screen. The ump feature allows you to enter "ANPSADPI" or "P001" to jump right to particular items if you know their mnemonics. After selecting items it is generally useful to type P to

review how the data will appear in columns across the screen. Initially, the area name (ANPSADPI) takes up 66 characters, leaving room for only one or two other items on the screen. Cursor to the right until you get to that column, then press W for idth, and specify a narrower column, e.g. 12 characters. Selecting records SUMLEV or summary level is the most important of the geographic codes for record selection on all of the 1990 census CD-ROMs, since it identifies the type of geography. To select records for all counties, for example, select SUMLEV and the value "050--Counties" on the summary level menu--even though your first inclination might be to select based on CNTY, the county code. The CNTY variable comes into play in selecting data for a single county. To display all tracts or BNAs within a particular county, select on both SUMLEV and CNTY (put 2 S's on the first menu) and specify a SUMLEV of "140" for TRACT's and BNA's. Understanding summary levels and the sequencing of records is so important that you will want to keep handy the summary level sequence chart on page 6-1 of the STF 1 or STF 3 documentation (or print out the first part of \DOCUMENT\SUM_LEV.ASC on the CD-ROM). You also need to keep straight which codes are available on which summary levels. For example, if you want to analyze tracts/BNAs by place, you need to specify a SUMLEV of "080" instead of "140", because SUMLEV 140 tract/BNA records do not include place codes, while SUMLEV 080 tract/BNA records do not. Similarly, if you want to list places within a county, you need to work with SUMLEV 155, not SUMLEV 160, because the latter do not have county codes. To determine which codes are available on which summary levels, see STF 1 documentation pages 2-3 to 2-15 or print out \DOCUMENT\HOWTOUSE.ASC. The best variables for selecting records are highlighted on the first record selection screen with an asterisk (*)--these codes can be selected from a menu and also have indexes to speed your search. Of these, SUMLEV, CNTY (county), and PLACEFP (place) are easiest to use. COUSUBFP (county subdivision) and TRACTBNA (census tracts/BNAs) lists are sequenced by county, so it is advisable to select records in two stages--first, using only CNTY, and then again from the main menu with COUSUBFP or TRACTBNA. Any variable in the data set can be used as a selection variable, including population (e.g., all block groups with 1 or more Hispanic residents) or even latitude and longitude. Unfortunately, since the files are so large (from 8 to 286 megabytes per state), and it can take 5 minutes per megabyte for scanning through unindexed records on a CD-ROM, it is advisable to make such selections only within a particular county, place, or tract/BNA. Adding labels For STFs 1A and 3A, the area name field (ANPSADPI) appears only on the first file in each series (STF1A0 or STF300). When working with other files, you may add labels associated with the county, county subdivision, place, or the lowest- level name (ANPSADPI). To add ANPSADPI, use the Add Labels options at the main menu, and select "--STF1A0" or "--STF300" at the first Add Labels menu, and ANPSADPI at the second menu. (This is not an issue for STF 1B, STF 1C, or PL94- 171 CDs, where all files contain ANPSADPI.) The Add Labels feature can be used to link data items from other files on the same CD-ROM as well as area names. At the second Add Labels menu, you may type A to select from ll data items, not just labels, and M to select ultiple items from that screen. For example, you could select not only the area name from STF1A0, but also the total population (P0010001). Labels can be specified from up to two sources, allowing you to view data from as many as three files at once. Adding labels does, however, slow down data display. Displaying definitions One of EXTRACT's unique features is its ability to display definitions of geographic and subject concepts. Typing D at the Select Items menu or in the columnar display screen will bring up a concept efinition keyed to the variable currently highlighted. You may also explore the definitions of related variables by typing I to bring up an ndex of all of the topics for which narrative discussions are available. A table locator for STF 1A and a 3-part table locator for STF 3 are also available from the this index. You also may access definitions from the main menu (type D ). Notes particular to P.L. 94-171 CD-ROMs Most data displays for subcounty areas should include the voting district identifier entitled "Special Area Code 3" (SAC3), since voting districts split many place, tract/BNA, and block group summaries, and govern their sequence in any case. When selecting records, the speediest access to data is provided by selecting on SUMLEV, PLACECE, CNTY, or CNTY and SAC3 together; and by specifying only one code at a time within the second-level record selection menus. Note that some codes are not available at all on the P.L. 94-171 files (noted in the menus) and others are available only within certain record types (see Figure 2 in the P.L. 94-171 documentation). Notes particular to STF 3B (ZIP Code Areas) Using EXTRACT with STF3B for ZIP Code Areas is different from using EXTRACT with other STF CD-ROMs in several respects: 1. SUMLEV does not have the importance it has elsewhere, since summary level 800 (ZIP code total) is omitted whenever the ZIP code does not cross a county boundary (i.e, where it would have duplicated the data on a SUMLEV 820 record (ZIP-State-county). If you want to extract only ZIP code totals, select records where the code SAC10 is 1. 2. Selecting records by ZIP code (SAC1) brings up a menu with multiple records for every ZIP code crossing county boundaries. Selecting from this menu selects by ZIP, not by county part, and there is no need to check more than one record for a given ZIP. 3. Indexes on the CD do not provide efficient access to ZIP codes by state, county or metropolitan area, even though there are state and county codes are on all SUMLEV 820 records and metropolitan area codes are on STF300.dbf. A roundabout approach is to select a range of ZIP codes you know encompasses particular state or counties of interest (since there are efficient indexes to ZIP codes); extract those areas to a .dbf file and use it; then select by state or county code from within that extract file. 4. LOGRECNU is of no use in merging files horizontally, since it is the record number for tape files that include the SUMLEV 800 records missing here. To merge data from other STF3B files, use Add Labels for up to 10 items from up to 2 files. Notes particular to STF 3C (United States Summary) STF 3C CD-ROMs include only one generally useful index--by SUMLEV by state by county. You can remedy this if you are willing to give up another 815 kb on your hard disc. STFAU3C2.EXE may be installed into your \EXTRACT\1990AUX subdirectory, giving you three additional indexes: one to help you access particular MSAs quickly, another for AIANAs, and another for particular urbanized areas. STFAU3C2 also includes an expanded STF3C.CTI and .NDX that overwrites the versions in STFAUX3C, and tells EXTRACT what the additional indexes are good for. Whether you need the extra indexes depends on how you need to access MSAs, AIANAs and UAs. If you want to select all MSAs within a state, the SUMLEV-by- state index on the CD will do fine. If you want to find one particular MSA, or all counties in a particular MSA, there is a real advantage to installing the optional indexes.