Git: how to list all files under version control along with their author date?

Given a git repo, I need to generate a dictionary of each version controlled file’s last modified date as a unix timestamp mapped to its file path. I need the last modified date as far as git is concerned – not the file system.

In order to do this, I’d like to get git to output a list of all files under version control along with each file’s author date. The output from git ls-files or git ls-tree -r master would be perfect if their output had timestamps included on each line.

  • automated push to a github repo with travis
  • Split a git repository to work on two projects at the same time
  • Hosting a Repository website on Github not working?
  • How to solve this Git issue?
  • Identifying commits that introduced lots of files
  • What is the reason of existance of detaching in git
  • Is there a way to get this output from git?

    Update for more context: I have a current implementation that consists of a python script that iterates through every file under source control and does a git log on each one, but I’m finding that that doesn’t scale well. The more files in the repo, the more git log calls I have to make. So that has led me to look for a way to gather this info from git with fewer calls (ideally just 1).

  • Setting up SSH keys for Bitbucket on Windows
  • Issues when importing a Symfony project from Github to IntelliJ IDEA
  • Setting up Android project to use git
  • Resource.Designer.cs under git
  • git port - network requirement
  • How to recover files added to git but overwritten by checkout
  • 4 Solutions collect form web for “Git: how to list all files under version control along with their author date?”

    a list of all files under version control along with each file’s author date

    Scaling isn’t a problem with this one:

    #!/bin/sh
    temp="${TMPDIR:-/tmp}/@@@commit-at@@@$$"
    trap "rm '$temp'" 0 1 2 3 15
    git log --pretty=format:"%H%x09%at" --topo-order --reverse "$@" >"$temp"
    cut -f1 "$temp" \
    | git diff-tree -r --root --name-status --stdin \
    | awk '
            BEGIN {FS="\t"; OFS="\t"}
            FNR==1{++f}
            f==1  {at[$1]=$2; next}
            NF==1 {commit=$1; next}
            $1=="D"{$1=""; delete last[$0]; next} # comment to also show deleted files
                  {did=$1;$1=""; last[$0]=at[commit]"\t"did}
            END   {for (f in last) print last[f]f}
     ' "$temp" - \
    | sort -t"`printf '\t'`" -k3
    

    I wrote the following script to output for each file the path, short hashtag and date.

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    #
    # Author: R.F. Smith <rsmith@xs4all.nl>
    # $Date: 2013-03-23 01:09:59 +0100 $
    #
    # To the extent possible under law, Roland Smith has waived all
    # copyright and related or neighboring rights to gitdates.py. This
    # work is published from the Netherlands. See
    # http://creativecommons.org/publicdomain/zero/1.0/
    
    """For each file in a directory managed by git, get the short hash and
    data of the most recent commit of that file."""
    
    import os
    import sys
    import subprocess
    import time
    from multiprocessing import Pool
    
    # Suppres terminal windows on MS windows.
    startupinfo = None
    if os.name == 'nt':
        startupinfo = subprocess.STARTUPINFO()
        startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
    
    def filecheck(fname):
        """Start a git process to get file info. Return a string
        containing the filename, the abbreviated commit hash and the
        author date in ISO 8601 format.
    
        Arguments:
        fname -- Name of the file to check.
        """
        args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
        try:
            b = subprocess.check_output(args, startupinfo=startupinfo)
            data = b.decode()[:-1]
            h, t = data.split('|')
            out = (fname[2:], h, time.gmtime(float(t)))
        except (subprocess.CalledProcessError, ValueError):
            return (fname[2:], '', time.gmtime(0.0))
        return out
    
    def main():
        """Main program."""
        # Get a list of all files
        allfiles = []
        # Get a list of excluded files.
        exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
        exc = subprocess.check_output(exargs).split()
        if not '.git' in os.listdir('.'):
            print('This directory is not managed by git.')
            sys.exit(0)
        for root, dirs, files in os.walk('.'):
            if '.git' in dirs:
                dirs.remove('.git')
            tmp = [os.path.join(root, f) for f in files if f not in exc]
            allfiles += tmp
        # Gather the files' data using a Pool.
        p = Pool()
        filedata = []
        for res in p.imap_unordered(filecheck, allfiles):
            filedata.append(res)
        p.close()
        # Sort the data (latest modified first) and print it
        filedata.sort(key=lambda a: a[2], reverse=True)
        dfmt = '%Y-%m-%d %H:%M:%S %Z'
        for name, tag, date in filedata:
            print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))
    
    
    if __name__ == '__main__':
        main()
    

    What I would do is run git ls-files and add all of them into an array, then run git log $date_args --name-only, and then parse that output and remove those files from the array while adding the date information to a dictionary, and stop the processing once the array is empty.

    Here you go:

    git ls-files -z | xargs -0 -n1 -I{} -- git log -1 --format='%at {}' {}
    

    This works on bash and probably sh.

    Git Baby is a git and github fan, let's start git clone.