Skip to main content
U.S. flag

An official website of the United States government

3d: Gathering Baseline Data


The first step in any change initiative is deciding what you intend to change. As part of the EBDM initiative and designing the system-level logic model, policy teams should have already made some decisions about goals and objectives (i.e., what long-term changes they hope to achieve) and about the types of procedural and policy changes that need to occur in order to attain the goals and objectives. The next to step is to understand more about these changes by gathering baseline data. Baseline data for the EBDM initiative can be defined in three broad categories:

  • case processing data;

  • data about the offender population; and

  • harm reduction data.

Case processing data and data about the offender population provide critical information about the primary issues addressed with the EBDM initiative. This data is particularly useful in shaping the initiative and refining the approach laid out during the initial planning. Gathering data about the intended harm reduction outcomes lays the foundation for evaluating the effectiveness of the EBDM initiative, allowing for an assessment of how much change has occurred since implementation.


This Starter Kit—which should be used following system mapping activities and the development of a system-level logic model—is designed to provide an overview of the types of data that a jurisdiction might want to collect to establish a baseline for the EBDM initiative. The focus is on case processing and offender population data.[1] Guidance on data collection strategies and analysis is provided, along with suggested next steps to prepare for the collection of harm reduction data. Using the information in this Starter Kit will help jurisdictions produce reports on current case processing policies and practices, what is known about the offender population, and what is known about the harms in your jurisdiction that are intended to be addressed through the EBDM initiative.


Partnering with Universities

The policy team in Grant County, Indiana, benefits from a longstanding relationship with a local professor at Indiana Wesleyan who conducts an ongoing evaluation of the drug and reentry courts.

Participants in the collection of baseline data should include representatives of the policy team to help articulate the actual data elements to be collected, as well as personnel in the various agencies/organizations from which the data will be drawn. Ideally, agency personnel would include those staff members responsible for designing or producing reports from data management systems. To the extent that new data will need to be collected, jurisdictions may wish to work with outside experts to help develop data collection instruments, codebooks, and data management systems. Local universities are a good resource for assistance with data collection, as is staff in county or agency IT departments.


The idea of gathering data may seem straightforward at first blush: identify the data reports that you want about case processing or about the offender population and analyze them. This type of approach, however, is likely to prove unmanageable at best and meaningless at worst. Too much data will overwhelm the process; people simply cannot interpret a lot of data with no analytic frame of reference. Too little data will not yield reliable information about trends and patterns and is likely to raise more questions than it answers. These instructions are intended to help policy teams frame the data-gathering activities in terms of what to collect and how.

The first step in gathering data is to decide what you want to look at and what questions need to be answered. Once this has been done, it is important to decide on a methodology for data collection and then to collect and analyze the data.

Deciding What Data to Gather

As part of the system mapping and development of the logic model, you have undoubtedly highlighted areas in which you would like to have more information. These areas form the basis for the questions that you are hoping to begin answering with your baseline data. For example, why are there so many dismissals, why does it take so many months to adjudicate a case, why are so many defendants held pre-trial, who makes up the pretrial jail population, how many people in the jail have mental health challenges, etc.

In deciding what data to gather, it is important to define the exact data elements that you want to collect:

  • Be sure to define key terms. Some examples include the following:

    • What constitutes a case? Is a case counted by charge or by defendant? Is it counted consistently across the system?

    • How will recidivism be defined? Is it an arrest for a new offense, or can it include sanctions for technical violations of probation?

  • Specify the data parameters, such as the time period from which the data will be gathered:

    • Will you gather a year’s worth of data, or several months’?

    • Will you collect data related to offenders in the system on a given day, and if so what day?

    • Are you interested in new arrests of offenders 90 days, 6 months, 1 year, 3 years, etc. after they complete their sentence?

Potential Definition of a Case

An example of one way to define “case” for the purpose of data collection is by incident. In this instance, an individual person who is involved in the criminal justice is only counted once regardless of the number of criminal charges pending from a single arrest. A person involved in multiple incidents over time (e.g., three arrests in one year) would be counted as three “cases.”

Identify what the unit of analysis will be:

  • Will you count people, charges, cases, beds, offense types, dollars, etc.?

Case Processing Data: Information related to case processing can focus on a variety of factors related to the nature and type of cases in the system as well as the volume and flow of cases. Depending on your policy team’s goals and the capabilities of your data systems, you may want to collect case processing data throughout the “life” of a case, or as early as arrest, booking, and filing/charging. Types of information that might be sought include the following:

  • number of cases by case type;

  • number of pending cases;

  • age of pending cases;

  • number of cases at different stages in the case processing continuum;

  • number of cases that proceed or “fall out” by decision point;

  • number and type of dispositions by case type;

  • number and type of release decisions by case type;

  • average sentence length;

  • number of probation revocations for technical violations and for new offenses;

  • number of bench warrants issued;

  • number of continuances; and

  • length of time between initial appearance and disposition by case type.

A caution for collecting case processing data, however, is to ensure that a “case” is defined the same way by all agencies providing the information. For example, police may count arrests as “cases,” prosecutors may count charges as “cases,” and so on. To avoid confusion, the definition of a case should be consistent across agencies.

Offender Population Data: Because data gathering can be labor intensive, the policy team should keep the data collection focused on the types of offenders or activities that you are planning to include as part of the EBDM initiative. So, if the focus is largely on pre-trial release, your data collection should focus on which offenders get released and by what means, and on which offenders do not get released. Conversely, if the intent is to focus on medium and high risk offenders for risk needs assessment at sentencing, then your data collection should focus on medium and high risk offenders.

No matter where in the system your data gathering will focus or which types of offenders you might hone in on, some common types of data should be collected about the offender population:

Establishing a Data Committee

Prior to engaging in the EBDM initiative, Ramsey County, Minnesota, had established a Data Committee, which was composed of data, research, and IT staff from key criminal justice agencies, including the courts, corrections, human services, District Attorney’s office, and others. Historically, the Data Committee would be “activated” to assist with various cross-agency projects, and it was activated again for the EBDM initiative.

Throughout Phase II of EBDM, the Data Committee met regularly and assisted in defining the baseline data, key measures, and other data elements necessary to measure the County’s performance under EBDM. The committee reported regularly to the EBDM policy team on progress in the collection of data, assisted in the system mapping process, and gathered data to respond to policy team members’ questions.

The Data Committee was instrumental in identifying and resolving potential roadblocks in data collection and developing common definitions for measures (which differed across individual agency information systems) across all agencies.

  • demographic characteristics of offenders;

  • criminal histories; and

  • previous sanctions and sentence lengths.

Harm Reduction DataThe types of harm reduction data will depend on the specific harm reduction goals defined and desired as a result of EBDM. However, some general guidance includes

  • incorporating data from other governmental systems, as appropriate, to include as examples

    • the number of people engaged in mental health services outside of the criminal justice system; and

    • emergency room admissions.

  • conducting primary research on areas of interest, for example

    • victim satisfaction surveys;

    • analysis of cost-benefits; and

    • comparative analysis of justice spending versus non-justice spending.

Information Gathering and Analysis Methods

There are many different techniques and approaches to gathering information. Because the information being collected under the EBDM initiative will be largely quantitative (i.e., numeric), only those approaches that are appropriate for quantitative data are described. There are two major types of data collection methods:

  1. primary data collection: development of surveys, questionnaires, and data collection forms to collect information that does not already exist in another form

  2. secondary data collection: collection of information from pre-existing datasets and data sources, such as case management systems

For the most part, you will be relying on secondary data collection. However, if you decide that primary data collection is necessary, it is highly advisable to work with an outside expert—perhaps from a local university—to develop and test any data collection instruments to ensure that the information gathered is reliable and valid.

With secondary data collection, the method you select – and the amount of external assistance you may need – will largely depend on the data-generating capacity of your system and the areas on which you have decided you need more information. Certain types of data collection and analysis will be more appropriate for understanding baseline case processing information, whereas other methods may be more appropriate for understanding the offender population. Thus, being clear about what you want to gather information on before you determine how (and from where) you will gather it is critically important. With this in mind, there are several different approaches that may be used in your information gathering:[2]

  • Pipeline analysis, in which a specific cohort of arrestees is selected and data is collected on them through their passage into and out of the criminal justice system.[3] The unit of analysis is the individual; the analysis focuses on counting people. A pipeline approach is used to track the number of people in the system at every decision point and attrition at the point at which people exit the system. So, for example, a pipeline analysis starts with the number of people arrested, how many received citations, how many were booked into jail, how many were released after booking, how many were charged with offenses, how many were not charged, how many were convicted, how many were sent to diversion or deferred prosecution programs, how many were sentenced to jail, how many received suspended sentences, how many were placed on probation, etc. One way to think about this pipeline analysis is to determine the quantities of some or all of the process steps and decision points on your system map for a given period of time.

  • Time analysis, in which the unit of analysis is either the individual defendant/offender or the case. The focus of a time analysis is to understand the amount of time associated with different aspects of the system or how long a particular process takes. So for example, a time analysis might look at the number of days individuals are detained in jail pre-trial, elapsed time from charging to disposition, amount of time in diversion or under supervision, length of time between preliminary hearing and trial/plea, length of time between violation and court action on the violation, etc. Time analysis data can also be noted on the system map, reflecting the lapse of time between one step (or decision point) and another.

  • Jail analysis, in which the focus is to develop a thorough understanding of persons who are booked into the jail, the length of time they are in jail, and their status during the stay (e.g., pretrial, sentenced, DOC transfer, probation violation, etc.).[4]

  • Comparative analysis, which seeks to understand the differences between offenders in your population. For example, if you are concerned about compliance with terms of probation, this type of analysis will allow you to determine if there are differences between offenders’ outcomes based on risk level, types of conditions, types of present offense, prior criminal history, etc. Other questions that can be answered include, but are not limited to, issues related to who is placed on pretrial detention, the types and lengths of sentences/conditions, case outcomes, etc.


  • As you define what information you would like to gather, identify the source of the information, including specifics about which agency has the data, the name of the person who will generate the report, and the time frame for delivering the data to the policy team.

  • Be specific about the data to be collected. Identify if you want summary data, data aggregated into totals, or some other format; how you want the information broken down (e.g., by type of offense, by justice system status, etc.); the time frame from which the information is being drawn; etc. The more details in the information-gathering instructions, the better.

  • It is often useful to mock up an example of a report that shows how you want to see the information and to provide that information to whomever is gathering the data. Doing this will provide additional clarity about what the data should look like once it is generated or extracted from existing information systems.

Ramsey County, Minnesota, Baseline Data Collected on Warrants

baseline data


active warrants

Key Findings from a Jail Analysis in One County

One EBDM site enlisted the help of an outside expert to conduct an analysis of the jail population. The consultant looked at a sample that consisted of 10% of the inmates released from the jail during 2010 (N = 533). As a result of the analysis, a number of observations and recommendations for reducing the jail population were presented, including the following:

  • 60% of all jail bed days were being used by those who had a probation violation associated with their booking. Officials should examine current policies and procedures for dealing with probation violation cases with the goal of trying to reduce, if possible, the time required to resolve them.

  • Defendants released on signature bonds had an average length of stay of 7 days and consumed 87% of the jail bed days for the pretrial release population. Jail officials should determine how they can ensure that signature bond decisions will be made on the first day in order to have a significant impact on the overall jail population.

  • While Native Americans released during the pretrial period represented 4% of all those released pre-trial, they consumed 20% of all jail bed days used by all inmates released pretrial. Officials should look into whether this population has higher Failure to Appear (FTA) rates; if so, they may benefit from implementing court date reminder procedures, which have been very effective in reducing failure-to-appear rates in other jurisdictions.

A key finding from this analysis led to a recommendation to improve the collection of baseline data in the jail. The county jail, like many across the country, overwrites the legal status at booking (e.g., pretrial, probation violation, sentenced) when the status changes. Therefore, important data is lost regarding the reason individuals are initially booked into jail. This finding led to a relatively easy fix: developing a field for “legal status at booking” and a separate field for “current legal status.” 

Example: Charlottesville-Albemarle County, Virginia, Steps to Calculate Baseline Data on Costs

Additional Resources/Readings

Carter, M., & Morris, L. (2007). Enhancing the management of adult and juvenile sex offenders: A handbook for policymakers and practitioners. Retrieved from

McGarry, P., & Ney, B. (2006). Getting it right: Collaborative problem solving for criminal justice (NIC Accession No. 019834). Retrieved from

Rossman, S. B., & Winterfield, L. (2009). Measuring the impact of reentry efforts. Retrieved from


[1] Harm reduction data is a third important component of information-gathering, specifically as it relates to outcome evaluations that are conducted; as such, harm reduction data are not discussed in detail in this document.

[2] The different approaches are not mutually exclusive; depending on what you are trying to understand, you may find that using two or more approaches provides a significantly clearer picture of the issues than using a single approach.

[3] McGarry & Ney, 2006.

[4] See McGarry & Ney (2006) for more detailed information about the analysis of the jail population.