Bayesian Networks and Geographical Information Systems (BN+GIS)

The objective of this project is to develop a Bayesian Network (BN) model to produce environmental risk maps for oil and gas site developments and to demonstrate the model’s scalability from a point to a collection of points.  To reach this objective, a benchmark BN model was formulated as a “proof of concept” using Aquifers, Ecoregions and Land Use / Land Cover maps as local and independent input variables. This model was then used to evaluate the probabilistic geographical distribution of the Environmental Sensibility of Oil and Gas (O&G) developments for a given study area. A Risk index associated with the development of O&G operation activities based on the spatial environmental sensibility was also mapped. To facilitate the Risk assessment, these input variables (maps) were discretized into three hazard levels: high, moderate and low.
A Geographical Information System (GIS) platform was used (ESRI ArcMap 10), to gather, modify and display the data for the analysis. Once the variables were defined and the hazard data was included on feature classes (layer shapefile format), Python 2.6 software was used as the computational platform to calculate the probabilistic state of all the Bayesian Network’s variables. This allowed to define Risk scenarios both on prognostic and diagnostic analysis and to measure the impact of changes or interventions in terms of uncertainty.
The resulting Python – ESRI ArcMap computational script was called “BN+GIS, which populated maps describing the spatial variability of the states of the Environmental Sensibility and of the corresponding Risk index. The latter in particular, represents a tool for decision makers to choose the most suitable location for placing a drilling rig, since it integrates three fundamental environmental variables. Also, results show that is possible to back propagate the information from the Environmental Sensibility to define the inherent triggering scenarios (hazard variables).

A case of study is presented to illustrate the applicability of the proposed methodology on a specific geographical setting. The Barnett Shale was chosen as a benchmark study area because sufficient information on this region was available, and the importance that it holds on the latest developments of unconventional plays in the country. The main contribution of this work relies in combining Bayesian Networks and GIS to define environmental Risk scenarios that can facilitate decision-making for O&G stakeholders such as land owners, industry operators, regulators and Non-Governmental Organizations (NGOs), before and during the development of a given site.




Study Case - Barnett Shale

The BN model proposed in this study was implemented in the Barnett Shale area, located at the central-northern region of Texas. The intensive research made in this play due to the increasing development of its gas resources has created a substantial amount of available information, where economic, societal, environmental and other technical data can be reached through numerous sources. Additionally, the Barnett Shale was selected due to the importance that those plays hold in the local and national economy, since the growth of the Eagle Ford production has been a key factor for unconventional plays all over the country (Montgomery et al., 2005). With the development of the technology to extract the gas from these types of unconventional reservoirs and the opening of a favorable market for natural gas, the Barnett Shale has become one of the most important rising economies in Texas and probably in the nation for the last decade (Jackson School of Geosciences, 2007).




Hazard Definition

The discretization of the three hazard variables was made using a criterion that provided a reasonable estimation to assess the environmental Risk. Each hazard variable responds to a triggering event that could represent an undesirable occurrence with potential negative environmental impact, such as a hydrocarbon spill, an increased footprint and pollutant emission, among others. The definition of the hazard states for each input variable are described as follows:


Operation activities such as O&G drilling or production could directly affect the quality or deplete the water sources, making the presence of underground major or minor water bodies an important issue to take under consideration. O&G site development activities include drilling operations that may directly affect the aquifers. One of the most important concerns is to avoid the pollution of the water body with drilling fluids, isolating the well either by injecting cement or by any other method. The spatial data was retrieved from the Texas Water Development Board (2006a, 2006b) and the Oklahoma Water Resources Board (2006a, 2006b) in polygon shapefile format (scale 1:250,000). These feature classes were modified to define the hazard levels for this variable as follows:

High: is composed of the outcrop zones of any major or minor aquifer. It represents most of the recharge zones, where the aquifer formation is exposed to the surface. The hazard is higher in this zone due to the sensitivity of the water quality given any triggering event on the surface or in the wellbore.
Moderate: represents the area where the aquifer formation is underneath the outcrop formation, meaning that if a well is perforated in that zone, the aquifer will be reached. These zones are called subcrop or downdip regions and represent a medium level of hazard due to the fact that the aquifer could be reached by a drilling operation, and also can be part of the recharge zone of the aquifer.
Low: are the zones where no aquifer can be found on the surface or underneath it if a well is perforated. If a triggering even occurs, the aquifer is not going to be directly affected because is not in contact with the wellbore. However, a minimum hazard level is considered for this hazard state, given that any contamination or disturbance that occurs, could reach a stream or water body that contributes to the recharge zone of any major or minor aquifer.


The ecoregions are defined as areas where the ecosystems possess similar types, quality and quantity of natural resources (US Environmental Protection Agency, 2011). This map was developed under the idea that ecological regions can be categorized due to the combination of different natural expressions like geology, physiography, vegetation, climate, soils type, land use, wildlife and hydrology. These regions are defined in four scale levels, where the coarser scale, level I, includes 15 ecoregions in North America, level II defines a detailed subdivision of the previous summing 105 regions, and so on, until reaching the finer scale for level IV ecoregions, with smaller polygons describing specific ecological features. The level IV ecoregions map was used for this study (Scale 1:250,000), and was retrieved from the U.S. Environmental Protection Agency (EPA) website for Texas and Oklahoma states. The hazard levels recognized on this variable are:

High: are areas with elevated precipitation levels. The soil possesses a high capability to hold nutrients and the vegetation is abundant, including wooded forests. Placing a rig site in this zone requires clearing arboreal vegetation, causing an important negative footprint impact and representing a threat for the water quality on the area as well.
Moderate: are isolated wooded areas on sandstones and shale beds with irregular topography with a history of oil and gas production. The soil does not hold many nutrients but the placement of a well site in these areas might lead to logging activities and flattening the surface, which means a moderate footprint impact.
Low: represent dry badlands, flat prairies with lower variety and quantity of vegetation, or even sandy / acidic soils that are unable to support an extensive ecosystem. These conditions allow the threat to be classified as low environmental impact if a well site is placed in these ecoregions, since the vegetation density is poor and few water bodies are endangered.

Land Use / Land Cover

This map represents a description of the land surface in terms of the type of soil and vegetation present and the particular use given to it. Urban areas, wetlands and agricultural lands are all delineated in this classification system. These data were retrieved from the U.S. Department of Agriculture (2007) website and their scale is 1:24,000. It shows the spatial distribution of the hazard for this variable, based on the presence of important features that could cause a higher footprint when the land is altered. The description of the hazard levels are described as:

High: are represented by agricultural lands with low sparse grasslands, conifer forests, wetlands, lakes, reservoirs and beaches. The hazard is considered higher when an area is currently less affected by human activities, e.g., wildlands.
Moderate: are composed of urban areas, croplands, savannas and rangelands. The impact of placing a rig site in these areas depends on the intervention to the land that has already been modified, but that can affect the air quality or water bodies used for residents on the vicinity or by the wild local species.
Low: are urban industrial lands, sandy areas other than beaches, strip mines quarries and gravel pits. As these areas are already highly affected by anthropic activities, the impact in the environment due to O&G practices is considered lower than in natural wildlands or water bodies.



The Process

To achieve the proposed objectives, a script was developed to compute and display the maps showing the geographical distribution of the state of the Environmental Sensibility and a Risk map showing the variability of the impact produced when a rig site is placed in different locations. Thus, the tool is able to provide a spatial distribution of the Risk based on a probabilistic analysis to help decision makers to pick the most suitable places to operate an O&G well with the lowest environmental disturbance. When the model computes the diagnostic reasoning, the BN+GIS tool displays the spatial distribution of the three states of the hazards. These hazards are described as the probability of occurrence of a triggering event (e.g., low, moderate and high hazard for aquifers) for each node, resulting in nine maps. Therefore, using the ArcMap 10 software, it is possible to display the initial conditions required if a desired state of Risk is chosen.

The methodology followed in this project consists of four phases. Initially, the data was gathered from different data repositories and then manipulated in ArcMap 10 to assign in the table of contents of each variable the correspondent hazard level according to the different features of the data. The BN+GIS tool also was prepared in this step of the process, including a new tool in ArcToolbox. The variables set by the user are: 1) buffer radius and 2) the probability distribution of the Environmental Sensibility for the diagnosis analysis (ESHigh, ESMod and ESLow). After this structure is already prepared, the script was written in Python 2.6 to generate the grid and discretize the spatial domain, to generate the centroids, create buffers with the radius defined by the user, to run the model in prognosis and diagnosis looping through each centroid, and to finally display the output maps in ArcMap.




The spatial analysis is made by the generation of circular areas around the centroid of a polygon grid with a radius that can be specified by the user through an interactive window. These areas are generated using the buffer tool using the imported Python’s package ArcPy. Each buffer area was intersected with the layer containing the spatial information of the three input variables, generating a new feature class. This layer is used to determine the proportion of the circular area with a specified level of hazard. The figure below illustrates an example of the process made on each point of the centroid grid to compute the marginal probability distribution of the parent nodes.






Energy Information Administration EIS, 2011, Shale Gas Plays, Lower 48 States, <> Accessed August/12, 2012.

Grêt-Regamey, A. and D. Straub, 2006, Spatially explicit avalanche risk assessment linking Bayesian Networks to a GIS, Natural Hazards Earth Systems, vol. 6, p. 911-926.

Houston Advanced Research Center, 2010, SCORECARD Reference Guide, Environmentally Friendly Drilling System, vol. 1, p. 1-174.

HPDI, 2008, Production Data. HPDI Production Data Application.

ISO 31000:2009, 2010, Setting a New Standard for Risk Management, Risk Analysis, vol. 30, no. 6, p. 881-886.

Jackson School of Geosciences, 2007, Barnett Boom Ignites Hunt For Unconventional Gas Resources, <> Accessed August, 15, 2012.

Jones, J., 2005, An Introduction to Factor Analysis of Information Risk (FAIR), <> Accessed September, 04, 2012.

Kocabas, V. and S. Dragicevic, 2007, Enhancing a GIS Cellular Automata Model of Land Use Change: Bayesian Networks, Influence Diagrams and Causality, Transactions in GIS, vol. 11, no. 5, p. 681-702.

Korb, K. and A. Nicholson, 2004, Bayesian Artificial Intelligence, London, U.K., CRC Press, 365 p.

Matthies, H., 2007, Quantifying Uncertainty: Modern Computational Representation of Probability and Applications, in Ibrahimbegovic, A. and I. Kozar, eds., Extreme Man-Made and Natural Hazards in Dynamics of Structures, Springer Netherlands, vol. Part II, p. 105-135.

Medina-Cetina, Z., P. Varela, J. Ryan, and O. Yu, 2012, System Engineering Design Methodology-Low Impact Well Design Optimization Final Report, REPSEA.

Montgomery, S., D. Jarvie, K. Bowker, and M. and Pollastro, 2005, Mississippian Barnett Shale, Fort Worth basin, north-central Texas: Gas-shale play with multi-trillion cubic feet potential. AAPG Bulletin, vol. 89, no. 2, p. 155-1745.

Navigant Consulting Inc., 2008, North American natural Gas Supply Assessment. Executive Summary and Update, <> Accessed 09/18, 2012.

Office of the United Nations Disaster Relief Co-ordinator (UNDRO) and Expert Group Meeting on Vulnerability Analysis, 1979, Natural Disasters; Risk, p. 1.

Oklahoma Water Resources Board, 2006a, OWRB Major Aquifers of Oklahoma, Oklahoma City,, scale 1:500.000, 1 sheet.

Oklahoma Water Resources Board, 2006b, OWRB Minor Aquifers of Oklahoma, Oklahoma City,, scale 1:500.000, 1 sheet.

Oreskes, N., K. Shrader-Frechette, and K. and Belitz, 1994, Verification, Validation, and Conformation of Numerical Models in the Earth Sciences. Science, vol. 263, p. 641-646.

Pearl, J., 1990, Reasoning Under Uncertainty, Annu. Rev. Comput. Sci., vol. 4, p. 37-72.


Pollastro, R., D. Jarvie, R. Hill, and C. and Adams, 2007, Geologic framework of the Mississippian Barnett Shale, Barnett-Paleozoic total petroleum system, Bend arch–FortWorth Basin, Texas, AAPG Bulletin, vol. 91, no. 4, p. 405-436.

Pursell, D., D. Heikkinen, and M. and Carver, 2006, Barnett Shale-Unfolded: Sedimentology, Sequnce Stratigraphy, and Regional Mapping. Gulf Coast Association of Geological Societies Transactions, vol. 58, p. 777-795.

Railroad Commission of Texas, January, 2013, Barnett Shale Information, <> Accessed February 2nd, 2012.

Stassopolou, A., M. Petrou, and J. and Kittler, 1998, Application of a Bayesian network in a GIS based decision making system. Geographical Information Science, vol. 12, no. 1, p. 23-45.

Texas Water Development Board, 2006a, Major Aquifers, Austin, Tx. USA, Texas Water Development Board, scale 1:250.000, 1 sheet.

Texas Water Development Board, 2006b, Minor Aquifers, Austin, Tx. USA, Texas Water Development Board, scale 1:250.000, 1 sheet.

Tian, Y. and W. and Ayers, 2010, Barnett Shale (Mississipian), Fort Worth Basin, Texas: Regional Variations in Gas and Oil Production and Reservoir Properties. CSUG/SPE, no. 137766.

UNDRO - Office of the United Nations Disaster Relief Co-ordinator, 1979, Natural Disasters and Vulnerability Analysis, Report of Expert Group Meeting.

US Environmental Protection Agency , 2011, US Level IV Ecoregions of the Continental United States, United States, EPA Office of Research; Development (ORD) - National Health and Environmental Effects Research Laboratory (NHEERL), scale 1:250.000, 1 sheet.

USDA NRCS - National Cartography & Geospatial Center, 2007, Land Use/Land Cover Map, USA, Web GIS, albilene, ardmore, austin, big_spring, brownwood, dallas, lawton, llano, sherman, tyler, waco, wichita_falls, scale 1:24.000, 12 sheets.


Varela, P., 2013, Bayesian Networks and Geographical Information Systems for Environmental Risk Assessment for Oil & Gas Site Development. Masters Thesis, Texas A&M University.

Yu, O. and Z. and Medina-Cetina, 2013, Decision-Making Modeling of Environmentally Friendly Drilling Systems for Shale Gas via Bayesian Networks, Knowledge Based Systems.

Yu, O., Z. Medina-Cetina, and J. and Briaud, 2011, Towards an Uncertainty-Based Design of Foundations for Onshore Oil and Gas Environmentally Friendly Drilling (EFD) Systems, Geo-Frontiers, vol. 1751.

Yu, O., 2010, Systems approach and quantitative decision tools for technology selection in Environmentally Friendly Drilling, Doctorate Dissertation, Texas A&M University.