UnixReview.com
September 2005

Shell Corner: Date-Related Shell Functions

Hosted by Ed Schaefer

This month, Julie Wang and Michael Wang weigh in with five date-related shell functions based on the unix cal calendar command.

Date Related Shell Functions
by Michael Wang and Julie Wang

Many shell programs need to compute dates, for example, to retrieve yesterday's backup, to create Oracle table partitions for next week, or to run a job the first Saturday every month. In this article, we present the following date-related functions written in shell:

  • pn_month — Previous and next x months relative to the given month
  • end_month — End of month of the given month
  • pn_day — Previous and next x days of the given day
  • cur_weekday — Day of week for the given day
  • pn_weekday — Previous and next x day of weeks relative to the given day

    Review of Current Tools

    To begin, we will survey what tools are currently available; then we'll explain why we created our shell functions and how they work.

    Most languages have expansive built-in date functions. For example, to compute the date of the day before 20050801, we do this in Perl:

      use Time::Local;
      my $a=timelocal(0,0,12,01,07,105);
      my ($mday, $mon, $year) = (localtime($a-86400))[3,4,5];
      $mon++;
      $year += 1900;
      printf("%04d%02d%02d\n", $year, $mon, $mday);
    
    In Perl, months are represented by 0 to 11 with 0 indicating January, and years are represented as the number of years since 1900. Thus, August is represented by 07, and the year 2005 is represented by 105. The day is 1. Since we are not concerned with hours, minutes, and seconds, any time will do. We use 12:00:00.

    The code simply converts the date and time to Epoch seconds, subtracts number of seconds in 1 day, and converts it back to date and time.

    Date functions in other scripting languages that have their roots in C, such as PHP, work similarly to Perl. Databases such as Oracle and MySQL support their own date functions. For example, in MySQL, we use:

      select date_format(date_sub('20050801', interval 1 day), '%Y%m%d');
    
    Shell does not have built-in date/time handling. It does not have to, because all the Unix commands are at its disposal, though normally it does not call PHP, Oracle, or MySQL.

    The dedicated command to do date manipulation is the GNU date. The same job can be done using GNU date like this:

      $ date -d '20050801 1 day ago' +"%Y%m%d"
      20050731
    
    Unfortunately, GNU date is not universally available. While it is the default date command on Linux, it is generally not available on traditional Unix machines, such as Solaris, HP-UX, AIX, etc., unless the Unix admin installs it.

    When we first wrote pman, a utility to manage Oracle tables partitioned by date [1], we utilized GNU date. When we deployed the utility on a failover cluster, we had to ensure GNU date was available on all nodes of the cluster, as well as the disaster-recovery site. This increases the cost of maintenance and simply is not possible in an environment that does not support open source software. That was when we started to look for more portable date-related functions.

    Perl is less of a problem in this regard. However, it is more than what we need for the tasks at hand. We chose to use the old, simple, omnipresent Unix tool "cal" and shell arithmetic to implement the functions. It is simply easier for us to write and for others to understand (especially for those who do not speak Perl).

    For these functions, we use a four-digit number YYYY to represent the year, a six-digit number YYYYMM to represent the year and month (with January as 01), and an eight-digit number YYYYMMDD representing the year, month, and day. The day of the week is represented by 0-6, with Sunday as 0.

    pn_month YYYYMM (+|-)x

    The pn_month function calculates the previous or next x months from the given month.

    It takes two parameters. The first one is the given month in YYYYMM format, and the second number x is the previous (with minus sign) or the next x months.

    Common-Sense Version

    The month after December is the January of the next year, and the month before January is December of the previous year. An implementation using this common-sense method is shown below:

      function pn_month {
        typeset ym=$1 pn=$2
      
        (( m = ym % 100 ))
        (( y = ym / 100 ))
      
        while (( pn != 0 )); do
          if (( pn > 0 )); then
            if (( m == 12 ))
            then (( m = 1 )); (( y = y + 1 ))
            else (( m = m + 1 ))
            fi
            (( pn = pn - 1 ))
          else
            if (( m == 1 ))
            then (( m = 12 )); (( y = y - 1 ))
            else (( m = m - 1 ))
            fi
            (( pn = pn + 1 ))
          fi
        done
    
        printf "%s\n" $(( 100*y + m ))
      }
    
    Consider this example:
    $ for i in -9 -8 -7 0 3 4 5; do pn_month 200508 $i; done      
    200411
    200412
    200501
    200508
    200511
    200512
    200601
    
    This function uses pure shell arithmetic. The modulus of YYYYMM over 100 calculates the MM portion. The integer division of YYYYMM over 100 delivers the YYYY portion. 100*YYYY+MM gives the YYYYMM representation.

    Formula Version

    Note that to jump from 200512 to the next month 200601, simply add 89. To jump from 200601 back to the previous month, subtract 89. This is true for any given year, proven with this equation:

      100*(YYYY+1)+01 - (100*YYYY + 12) = 89
    
    Thus, to compute previous and next month, we must subtract or add (respectively) an additional 88 when it goes over the year boundary.

    An implementation with this method is shown below:

      function pn_month {
        typeset ym=$1 pn=$2 x n
        (( x = ym % 100 + pn ))
      
        if (( x > 0 ))
        then (( n = (x-1) / 12 ))
        else (( n = - (12-x) / 12 ))
        fi
        
        printf "%s\n" $(( ym + pn + 88*n ))
      }
    
    First, we get the given month and add or subtract the month offset. If the resulting number x is between 1 and 12, obviously it did not go over the year boundary. However, if x is between 13 and 24, it goes over the year boundary once; if x is between 25 and 36, it goes over twice; and so on. The number of times it goes over the year boundary n is the integer division of x-1 over 12.

    Similarly, for x <= 0, if x is between 0 and -11, it goes over the year boundary once. If x is between -12 and -23, it goes over twice, and so on. The number of times it goes over the year boundary n is the integer division of 12-x over 12, in this case. We add a minus sign because it goes back in years.

    Add or subtract an additional 88 each time it goes over the year boundary to get the result.

    Let us verify the function against GNU date:

      $ pn_month 200508 -835
      193601
      $ date -d '20050801 835 month ago' +%Y%m
      193601
    
      $ pn_month 200508 -836
      193512
      $ date -d '20050801 836 month ago' +%Y%m
      203801
    
    The pn_month and GNU date agree within certain ranges. On our Linux box, GNU suffers from the "Year 2038" problem (2,147,483,647 seconds after the epoch, on Jan 19 03:14:07 2038 CVT, a long integer overflows and present Unix systems will fail), while our shell function does not.

    end_month YYYYMM

    end_month takes 1 parameter, YYYYMM, and outputs the date at the end of the month in the format YYYYMMDD.

    Here is how it works:

      $ end_month 200501
      20050131
      $ end_month 200502
      20050228
    
    Here is the function:
      function end_month {
        typeset ym=$1 y m ld
        ((  y = ym  / 100 ))
        ((  m = ym  % 100 ))
        for ld in $(cal $m $y); do :; done
        printf "%s\n" $(( ym*100 + ld ))
      }
    
    Simply loop through every element in the "cal" output for the given month to get the last day, and add it to the year and month to produce the output.

    Instead of using the for loop, we could use this:

      set -- $(cal $m $y)
      eval ld=\${$#}
    
    or this:
      set -- $(cal $m $y)
      ld=${@:$#}
    
    The for loop is probably more portable.

    pn_day YYYYMMDD (+|-)x

    pn_day computes the previous or next x days from the given date. It takes two parameters. The first one is the given date in YYYYMMDD format, and the second one is the previous x days for a negative number and the next x days for a positive number. It outputs the resulting date in the YYYYMMDD format.

    Here is how it works:

      $ pn_day 20050102 -1
      20050101
      $ pn_day 20050102 -2
      20041231
      $ pn_day 20050102 -3
      20041230
    
    Here is the function:
      function pn_day {
        typeset ymd=$1 pn=${2:-0} ym y m d x
      
        ((  d = ymd % 100 ))
        (( ym = ymd / 100 ))
        ((  y = ym  / 100 ))
        ((  m = ym  % 100 ))
      
        if (( pn < 0 )); then
          if (( d > 1 )); then
            (( x = ymd - 1 ))
            (( x > 17520902 && x < 17520914 )) && (( x = 17520902 ))
            pn_day $x $(( pn + 1 ))
          else
            pn_day $(end_month $(pn_month $ym -1)) $(( pn + 1 ))
          fi
        elif (( pn > 0 )); then
          if (( ymd < $(end_month $ym) )); then
            (( x = ymd + 1 ))
            (( x > 17520902 && x < 17520914 )) && (( x = 17520914 ))
            pn_day $x $(( pn - 1 ))
          else
            pn_day $(( 100*$(pn_month $ym +1) + 1 )) $(( pn - 1 ))
          fi
        else
          printf "%s\n" $ymd
          return 0
        fi
      }
    
    Consider the case for x=-1 - the previous day. If the given date is not the beginning of the month, then the answer is simply the given date decremented by 1. If the date is the beginning of the month, then the answer is the end_month of previous month. Both end_month and pn_month functions are available, so nothing is new.

    A similar analysis can be done for the case for x=1 - the next day. If the given date is not the end of the month, then the answer is simply the given date incremented by 1. If the given date is the end of the month, then the answer is the beginning of the next month. Again, the pn_month function is used, so nothing is new.

    If x is greater than 1, we call the function recursively each time with |x| decremented by 1, until x becomes 0. Unlike pn_month, it is difficult to build a formula. This is where the calendar becomes handy.

    There is an abnormality in the calendar in September 1752:

      $ cal 9 1752
         September 1752
      Su Mo Tu We Th Fr Sa 
             1  2 14 15 16
      17 18 19 20 21 22 23
      24 25 26 27 28 29 30
    
    1752 is the year when England (and its American colonies) switched from the Julian calendar to the Gregorian calendar. The "cal" man page states "The Gregorian Reformation is assumed to have occurred in 1752 on the 3rd of September... Ten days following that date were eliminated by the reformation, so the calendar for that month is a bit unusual. Thus, those 11 days were eliminated from human history."

    We could have built an array for calendar dates indexed by the positions. To get the previous or next date within the given month, we decrement or increment the index instead of the value of the dates. This is how we read the calendar. However, since this is the only exception, we simply added two lines to blackout those 11 days.

    Here is how it works:

      $ pn_day 17520902 +1
      17520914
      $ pn_day 17520914 -1
      17520902
    

    cur_weekday YYYYMMDD

    This function takes YYYYMMDD as an input and outputs the day of week represented by 0-6 with 0 being Sunday.

    Here is how it works:

      $ cur_weekday 20050815
      1
    
    This is the function:
      function cur_weekday {
        typeset ymd=$1 ym y m d i
        (( ymd >= 17520914 && ymd <= 17520930 )) && (( ymd = ymd - 11 ))
        ((  d = ymd % 100 ))
        (( ym = ymd / 100 ))
        ((  y = ym  / 100 ))
        ((  m = ym  % 100 ))
        cal $m $y | while read i; do
          set -- $i
          [[ $1 == 1 ]] && { 
            printf "%s\n" $(( ( 6 + d - $# ) % 7 ))
            break
          }
        done
      }
    
    The day of week loops through 0 to 6. Thus it is the modulus of the date plus an adjusting number x over 7. Since the last date in the first row, y, is Saturday (6), the adjusting factor can be chosen as 6 - y, as shown in the function code.

    We take care of the September 1752 issue by eliminating 11 days:

    $ cur_weekday 17520914
    4
    
    September 14, 1752 is Thursday instead Monday if the 11 days were present.

    pn_weekday (+)YYYYMMDD W (+|-)x

    pn_weekday computes the previous or next xth occurrence of the specified weekday from the given date. The pn_weekday function takes 3 parameters. The first number is the given date, the second is the weekday to find, the third number x is the x'th occurrence of the weekday either from previous dates (negative number) or next dates (positive number). If the given date includes a + at the beginning, +YYYYMMDD, the given date should be included in the search as well.

    For example, to find the third Monday in August 2005, use this:

      $ pn_weekday +20050801 1 3
      20050815
    
    This is the pn_weekday function:
      function pn_weekday {
        typeset ymd=$1 weekday=$2 pn=${3:-0} i x found=0 IN=0
        [[ $ymd == +* ]] && IN=1
      
          if (( pn < 0 ))
        then (( sign = -1 ))
        elif (( pn > 0 ))
        then (( sign = +1 ))
        else (( sign =  0 ))
        fi
      
        (( i = pn*sign*7 ))
      
        while (( i > 0 )); do
          (( IN == 0 )) && ymd=$(pn_day $ymd $sign)
          x=$(cur_weekday $ymd)
          (( x == weekday )) && {
            (( found = ymd ))
          }
          (( IN == 1 )) && ymd=$(pn_day $ymd $sign)
          (( i = i - 1 ))
        done
        printf "%s\n" $found
      }
    
    Simply use pn_day already built to walk through the calendar and use the cur_weekday to check the weekday.

    Summary

    In this article, we introduced five functions: pn_month, end_month, pn_day, cur_weekday, and pn_weekday. pn_month, end_month, and cur_weekday are independent of the rest of the functions. pn_day is built on top of pn_month, and end_month, and pn_weekday is built on top of pn_day and cur_weekday.

    There are three ways to use these functions:

    1. Include them in your program (none of which is terribly long. The total number of lines is 86).
    2. Source them into your program with the dot (.) operator.
    3. Use the FPATH variable available in the Korn shell.
    By the way, these functions can also be run under bash (version 2.05b is what we tested).

    These functions demonstrate shell arithmetic. We hope you find the technique interesting and these functions useful. See you next time.

    This article and the five date-related functions introduced are also available from http://www.unixlabplus.com/unix-prog/date_function/. Error fixes and enhancements, if any, will be available at the URL.

    References

    [1] "pman - Oracle partition manager." Michael Wang. Retrieved 14 August 2005.

    Julie Wang works for Independence Air. She manages Oracle databases, Unix operating systems, and Lawson enterprise systems. She can be reached at: Julie.Wang@flyi.com.

    Michael Wang earned a Master's Degrees in Physics (Peking, 1987) and Statistics (Columbia, 2001). Currently, he is studying Unix, Oracle, and corporation politics. He can be reached at: xw73@columbia.edu.

    Their past technical writings are listed here: http://www.unixlabplus.com/unix-prog/Publication.txt.

    Copyright © 2005 UnixReview.com, UnixReview.com's Privacy Policy. Comments about the Web site: webmaster@unixreview.com