******************************************************************
PySAL: Example Data Sets 
******************************************************************

PySAL comes with a number of example data sets that are used in some of the
documentation strings in the source code. All the example data sets can be
found in the **examples** directory.

10740
=====

Albuquerque, New Mexico, Census 2000 Tract Data
-----------------------------------------------

* 10740.dbf: attribute data. (k=5) 
* 10740.shp: shapefile. (n=195)
* 10740.shx: spatial index.
* 10740_queen.gal: queen contiguity weights in GAL format.
* 10740_rook.gal: rook contiguity weights in GAL format.


Line
=====

Line Shapefile
---------------

* Line.dbf: attribute data. (k=3)
* Line.prj: ESRI projection file. 
* Line.shp: Line shapefile. (n=4)
* Line.shx: spatial index.

Used for testing.


Point
=====

Point Shapefile
---------------

* Point.dbf: attribute data. (k=3)
* Point.prj: ESRI projection file.
* Point.shp: Point shapefile. (n=9)
* Point.shx: spatial index.

Used for testing


Polygon
=======

Polygon Shapefile
-----------------

* Polygon.dbf: attribute data. (k=3)
* Polygon.prj: ESRI projection file.
* Polygon.shp: Polygon shapefile. (n=3)
* Polygon.shx: spatial index.

Used for testing.


arcgis
======

arcgis testing files
--------------------

* arcgis_ohio.dbf: spatial weights in ArcGIS DBF format.
* arcgis_txt.txt: spatial weights in ArcGIS TXT format.

Files used for internal testing.


baltim
======

Baltimore house sales prices and hedonics 1978
----------------------------------------------------------------

* baltim.dbf: attribute data. (k=17)
* baltim.shp: Point shapefile. (n=211)
* baltim.shx: spatial index.
* baltim.tri.k12.kwt: kernel weights using a triangular kernel with 12 nearest neighbors in KWT format.
* baltim_k4.gwt: nearest neighbor weights (4nn) in GWT format.
* baltim_q.gal: queen contiguity weights in GAL format.
* baltimore.geojson: spatial weights in geojson format.


book
====

Synthetic data to illustrate spatial weights 
--------------------------------------------

* book.gal: rook contiguity weights in GAL format for regular lattice.
* book.txt: attribute data for regular lattice.

Source: Anselin, L. and S.J. Rey (in progress) Spatial Econometrics: Foundations.


burkitt
=======

Burkitt's lymphoma in the Western Nile district of Uganda
---------------------------------------------------------

* burkitt.dbf: attribute data. (k=6)
* burkitt.shp: Point shapefile. (n=188)
* burkitt.shx: spatial index.


calemp
======

Employment density for California counties
------------------------------------------

* calempdensity.csv: data on employment and employment density in California
  counties. (n=58, k=11)

  Source: Anselin, L. and S.J. Rey (in progress) Spatial Econometrics: Foundations.


chicago
=======

Chicago neighborhoods
--------------------

* Chicago77.dbf: attribute data. (k=11)
* Chicago77.shp: Polygon shapefile. (n=77)
* Chicago77.shx: spatial index.


columbus
========

Columbus neighborhood crime data 1980
-------------------------------------

* columbus.dbf: attribute data. (k=20)
* columbus.gal: queen contiguity file in GAL format.
* columbus.html: metadata.
* columbus.json: shape and attribute data file in JSON format.
* columbus.shp: Polygon shapefile. (n=49)
* columbus.shx: spatial index.

Anselin, Luc (1988). Spatial Econometrics. Boston, Kluwer Academic, Table 12.1, p. 189.


desmith
=======

Small dataset to illustrate Moran's I statistic
-----------------------------------------------

* desmith.gal: spatial weights in GAL format.
* desmith.txt: attribute data. (n=10, k=2)

Figure 5-31 of de Smith, Goodchild and Longley (2015)

Source: <http://www.spatialanalysisonline.com/>

Used with permission.


geodanet
========

Datasets from geodanet for network analysis
-------------------------------------------

* crimes.dbf: attribute data for crime point set. (k=2)
* crimes.prj: ESRI projection file.
* crimes.sbn: spatial index.
* crimes.sbx: spatial index.
* crimes.shp: Point shapefile for crime data. (n=287)
* crimes.shp.xml: metadata.
* crimes.shx: spatial index.
* schools.dbf: attribute data for schools point set. (k=1)
* schools.prj: ESRI projection file.
* schools.sbn: spatial index.
* schools.sbx: spatial index.
* schools.shp: Point shapefile for schools data. (n=8)
* schools.shp.xml: metadata.
* schools.shx: spatial index.
* streets.dbf: attribute data for street polyline set. (k=2)
* streets.prj: ESRI projection file.
* streets.sbn: spatial index.
* streets.sbx: spatial index.
* streets.shp: Line shapefile data for street data. (n=293)
* streets.shx: spatial index.


juvenile
========

Residences of juvenile offenders in Cardiff, UK
-----------------------------------------------

* juvenile.dbf: attribute data. (k=3)
* juvenile.gwt: spatial weights in GWT format.
* juvenile.shp: Point shapefile. (n=168)
* juvenile.shx: spatial index.

Source: http://geodacenter.org/downloads/data-files/juvenile.zip


mexico
======

Mexican states regional income 1940-2000
----------------------------------------

* mexico.csv: attribute data. (n=32, k=13)
* mexico.gal: spatial weights in GAL format.

Data used in Rey, S.J. and M.L. Sastre Gutierrez. (2010) "Interregional inequality dynamics in Mexico." Spatial Economic Analysis, 5: 277-298.


nat
===

US county homicides 1960-1990
-----------------------------

* NAT.dbf: attribute data for US county homicides. (k=69)
* NAT.shp: Polygon shapefile for US counties. (n=3085)
* NAT.shx: spatial index.
* nat.geojson: shape and attribute data in GeoJSON format.
* nat_queen.gal: queen contiguity weights in GAL format.
* nat_queen_old.gal: queen contiguity weights in GAL format using old polygon index.
* nat_trian_k20.kwt: kernel weights in KWT format.
* natregimes.dbf: attribute data for US county homicides with regimes assigned. (k=73)
* natregimes.shp: Polygon shapefile for US counties. (n=3085)
* natregimes.shx: spatial index.


networks
========

Datasets used for network testing
---------------------------------

* eberly_net.dbf: attribute data. (k=3)
* eberly_net.shp: Line shapefile. (n=29)
* eberly_net.shx: spatial index.
* eberly_net_pts_offnetwork.dbf: attribute data for points off network. (k=2)
* eberly_net_pts_offnetwork.shp: Point shapefile. (n=100)
* eberly_net_pts_offnetwork.shx: spatial indexã
* eberly_net_pts_onnetwork.dbf: attribute data for points on network. (k=1)
* eberly_net_pts_onnetwork.shp: Point shapefile. (n=110)
* eberly_net_pts_onnetwork.shx: spatial index.
* nonplanarsegments.dbf: attribute data. (k=1)
* nonplanarsegments.prj: ESRI projection file.
* nonplanarsegments.qpj: QGIS projection file.
* nonplanarsegments.shp: Line shapefile. (n=2)
* nonplanarsegments.shx: spatial index.


newHaven
========

New Haven crime dataset Sept 2014
---------------------------------

* new_haven_merged.dbf: attribute data for New Haven crimes. (k=5)
* new_haven_merged.shp: Point shapefile. (n=3293)
* new_haven_merged.shx: spatial index.
* newhaven_nework.dbf: attribute data for New Haven street network. (k=6)
* newhaven_nework.prj: ESRI projection file.
* newhaven_nework.qpj: QGIS projection file.
* newhaven_nework.shp: Line shapefile. (n=1537)
* newhaven_nework.shx: spatial index.


sacramento2
===========

2000 Census Tract Data for Sacramento MSA
-----------------------------------------

* sacramentot2.dbf: attribute data. (k=31)
* sacramentot2.gal: queen contiguity weights in GAL format.
* sacramentot2.sbn: spatial index.
* sacramentot2.sbx: spatial index.
* sacramentot2.shp: Polygon shapefile. (n=403)
* sacramentot2.shx: spatial index.

Source: US Census Bureau, 2000 Census (Summary File 3). Extracted from http://factfinder.census.gov in April 2004.


sids2
=====

North Carolina county SIDS death counts and rates
-------------------------------------------------
 
* sids2.dbf: attribute data. (k=18)
* sids2.html: metadata.
* sids2.shp: Polygon shapefile. (n=100)
* sids2.shx: spatial index.
* sids2.gal: spatial weights in GAL format.

Source: http://geodacenter.org/downloads/data-files/sids2.zip


snow_maps
=========

Public water pumps and Cholera deaths in London 1954 (Snow's Map)
-----------------------------------------------------------------

* SohoPeople.dbf: attribute data for Cholera deaths. (k=2)
* SohoPeople.prj: ESRI projection file.
* SohoPeople.sbn: spatial index.
* SohoPeople.sbx: spatial index.
* SohoPeople.shp: Point shapefile for Cholera deaths. (n=324)
* SohoPeople.shx: spatial index.
* SohoWater.dbf: attribute data for public water pumps. (k=1)
* SohoWater.prj: ESRI projection file.
* SohoWater.sbn: spatial index.
* SohoWater.sbx: spatial index.
* SohoWater.shp: Point shapefile for public water pumps. (n=13)
* SohoWater.shx: spatial index.
* Soho_Network.dbf: attribute data for street network. (k=1)
* Soho_Network.prj: ESRI projection file.
* Soho_Network.sbn: spatial index.
* Soho_Network.sbx: spatial index.
* Soho_Network.shp: Line shapefile for street network. (n=118)
* Soho_Network.shx: spatial index.


south
=====

Homicides and selected socio-economic characteristics for Southern U.S. counties (subset of NCOVR national data set). Data for four decennial census years: 1960, 1970, 1980, 1990.
----------------------------------------------------------------------

* south.dbf: attribute data. (k=69)
* south.shp: Polygon shapefile. (n=1412)
* south.shx: spatial index.
* south_q.gal: queen contiguity weights in GAL format.
* south_queen.gal: queen contiguity weights in GAL format.


stl
===

Homicides and selected socio-economic characteristics for counties surrounding St Louis, MO. Data aggregated for three time periods: 1979-84 (steady decline in homicides), 1984-88 (stable period), and 1988-93 (steady increase in homicides).
---------------------------------------------------------------------------

* stl.gal: queen contiguity weights in GAL format.
* stl_hom.csv: attribute data and WKT geometry.
* stl_hom.dbf: attribute data. (k=22)
* stl_hom.html: metadata.
* stl_hom.shp: Polygon shapefile. (n=78)
* stl_hom.shx: spatial index.
* stl_hom.txt: selected attribute data.
* stl_hom.wkt: a Well-Known-Text representation of the geometry.
* stl_hom_rook.gal: rook contiguity weights in GAL format.

Source: S. Messner, L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller (2000). An Atlas of the Spatial Patterning of County-Level Homicide, 1960-1990. Pittsburgh, PA, [National Consortium on Violence Research (NCOVR)](http://www.ncovr.heinz.cmu.edu).


street_net_pts
==============

Street network points
---------------------

* street_net_pts.dbf: attribute data. (k=1)
* street_net_pts.prj: ESRI projection file.
* street_net_pts.qpj: QGIS projection file.
* street_net_pts.shp: Point shapefile. (n=303)
* street_net_pts.shx: spatial index.


taz
===

Dataset used for regionalization
--------------------------------

* taz.dbf: attribute data. (k=14)
* taz.shp: Polygon shapefile. (n=4109)
* taz.shx: spatial index.


us_income
=========

Per-capita income for the lower 48 US states 1929-2009
------------------------------------------------------

* spi_download.csv: regional per capita income time series 1969-2008. (source:  Regional Economic Information System, Bureau of Economic Analysis, U.S. Department of Commerce) 
* states48.gal: contiguity weights in GAL format.
* us48.dbf: attribute data. (k=8)
* us48.shp: Polygon shapefile. (n=48) 
* us48.shx: spatial index.
* usjoin.csv: 48 US states per capita income time series 1929-2009.


virginia
========

Virginia counties shapefile
---------------------------

* virginia.dbf: attribute data. (k=7)
* virginia.gal: rook contiguity weights in GAL format.
* virginia.json: attribute and shape data in JSON format.
* virginia.prj: ESRI projection file.
* virginia.shp: Polygon shapefile. (n=136)
* virginia.shx: spatial index.
* virginia_queen.dat: queen contiguity weights in DAT format.
* virginia_queen.dbf: queen contiguity weights in DBF format.
* virginia_queen.gal: queen contiguity weights in GAL format.
* virginia_queen.mat: queen contiguity weights in MATLAB MAT format.
* virginia_queen.mtx: queen contiguity weights in Matrix Market MTX format.
* virginia_queen.swm: queen contiguity weight in ArcGIS SWM format.
* virginia_queen.txt: queen contiguity weights in TXT format.
* virginia_queen.wk1: queen contiguity weights in Lotus Wk1 format.
* virginia_rook.gal: rook contiguity weights in GAL format.


wmat
====

Datasets used for spatial weights testing
-----------------------------------------

* geobugs_scot: spatial weights in GeoBUGS text format.
* lattice10x10.shp: Polygon shapefile for 10 * 10 regular lattices. (n=100)
* lattice10x10.shx: spatial index.
* ohio.swm: spatial weights in ArcGIS SWM format.
* rook31.dbf: attribute data. (k=2)
* rook31.gal: rook contiguity weights in GAL format.
* rook31.shp: Polygon shapefile. (n=3)
* rook31.shx: spatial index.
* spat-sym-us.mat: spatial weights in MATLAB MAT format.
* spat-sym-us.wk1: spatial weights in Lotus Wk1 format.
* spdep_listw2WB_columbus: spatial weights in GeoBUGS text format.
* stata_full.txt: full spatial weights matrix.
* stata_sparse.txt: sparse spatial weights matrix.
* wmat.dat: spatial weights in DAT format.
* wmat.mtx: spatial weights in Matrix Market MTX format.
