Friday, July 7, 2017

Time series - part I


                                                               Time Series

Overview of Time Series Characteristics



Definition:
A univariate time series is a steps of measurements of the same variable collected over time period.  Most often, the measurements are made at regular time intervals.


Basic Objectives of the Analysis
The basic objective usually is to determine a model that describes the pattern of the time series.  Uses are:
  1. To understand the important features of the time series pattern.
  2. To detail out how the past affects the future or how two time series can “interact”.
  3. To forecast future values of the series.
  4. To possibly serve as a control standard for a variable that measures the quality of product in some manufacturing situations.


Types of Models
There are two basic types of “time domain” models.
  1. Models that relate the present value of a series to past values and past prediction errors - these are called ARIMA models (for Autoregressive Integrated Moving Average).

  1. Ordinary regression models that use time indices as x-variables.  These can be helpful for an initial description of the data and form the basis of several simple forecasting methods.

General characteristics: 




  • Is there a trend, meaning that, on average, the measurements tend to increase (or decrease) over time?

  • Is there seasonality, meaning that there is a regularly repeating pattern of highs and lows related to calendar time such as seasons, quarters, months, days of the week, and so on?
  • Is there a long-run cycle or period unrelated to seasonality factors?

  • Is there constant variance over time, or is the variance non-constant?


One of the simplest ARIMA type models is a model in which we use a linear model to predict the value at the present time using the value at the previous time.  This is called an AR(1) model, standing for autoregressive model of order 1.  The order of the model indicates how many previous times we use to predict the present time

A start in evaluating whether an AR(1) might work is to plot values of the series against lag 1 values of the series.  Let xt denote the value of the series at any particular time t, so xt-1 denotes the value of the series one time before time t.  That is, xt-1 is the lag 1 value of xt.  As a short example, here are the first five values in the earthquake series along with their lag 1 values:
t
xt
xt-1 (lag 1 value)
1
13
*
2
14
13
3
8
14
4
10
8
5
16
10

If we plot the graph between lag(X-axis) vs Xt(Y-axis), we will see a positive linear association

The AR(1) model
Theoretically, the AR(1) model is written
 X(t) = Wt + Constant + A.X(t-1)

We assumed a factor of error W which is normally distributed with time t.


Equation after AR(regression) comes out to be
quakes = 9.19 + 0.543 lag1

P-value are less than 0.05, thus lag is a helpful predictor though R-squared value is week, so model won’t give us great predictions.


Residual Analysis
In traditional regression, a plot of residuals versus fits is a useful diagnostic tool.  The ideal for this plot is a horizontal band of points.  Following is a plot of residuals versus predicted values for our estimated model.  It doesn’t show any serious problems.

Example 2
A rough plot below shows a time series pattern of producing coffee.
Some important features are:
  • There is an upward trend, possibly a curved one.
  • There is seasonality – a regularly repeating pattern of highs and lows related to quarters of the year.
  • There are no obvious outliers.
  • There might be increasing variation as we move across time, although that’s uncertain.

picq.png


There are ARIMA methods for dealing with series that exhibit both trend and seasonality, which will be discussed in next post.

Part II continues below:


No comments:

Post a Comment

5 States data in geoChart