Skip to content

log_parse

parse_log_dir

A function which will gather all log files within a given folder and pass them along for visualization.

Parameters:

Name Type Description Default
dir_path str

The path to a directory containing log files (or the path to a specific single log file).

required
log_extension str

The extension of the log files.

'.txt'
recursive_search bool

Whether to recursively search sub-directories for log files.

False
smooth_factor float

A non-negative float representing the magnitude of gaussian smoothing to apply (zero for none).

0
save bool

Whether to save (True) or display (False) the generated graph.

False
save_path Optional[str]

Where to save the image if save is true. Defaults to dir_path if not provided.

None
ignore_metrics Optional[Set[str]]

Any metrics within the log files which will not be visualized.

None
include_metrics Optional[Set[str]]

A whitelist of metric keys (None whitelists all keys).

None
pretty_names bool

Whether to modify the metric names in graph titles (True) or leave them alone (False).

False
group_by Optional[str]

Combine multiple log files by a regex to visualize their mean+-stddev. For example, to group together files like [a_1.txt, a_2.txt] vs [b_1.txt, b_2.txt] you can use: r'(.*)_[\d]+.txt'.

None
Source code in fastestimator/fastestimator/summary/logs/log_parse.py
def parse_log_dir(dir_path: str,
                  log_extension: str = '.txt',
                  recursive_search: bool = False,
                  smooth_factor: float = 0,
                  save: bool = False,
                  save_path: Optional[str] = None,
                  ignore_metrics: Optional[Set[str]] = None,
                  include_metrics: Optional[Set[str]] = None,
                  pretty_names: bool = False,
                  group_by: Optional[str] = None) -> None:
    """A function which will gather all log files within a given folder and pass them along for visualization.

    Args:
        dir_path: The path to a directory containing log files (or the path to a specific single log file).
        log_extension: The extension of the log files.
        recursive_search: Whether to recursively search sub-directories for log files.
        smooth_factor: A non-negative float representing the magnitude of gaussian smoothing to apply (zero for none).
        save: Whether to save (True) or display (False) the generated graph.
        save_path: Where to save the image if save is true. Defaults to dir_path if not provided.
        ignore_metrics: Any metrics within the log files which will not be visualized.
        include_metrics: A whitelist of metric keys (None whitelists all keys).
        pretty_names: Whether to modify the metric names in graph titles (True) or leave them alone (False).
        group_by: Combine multiple log files by a regex to visualize their mean+-stddev. For example, to group together
            files like [a_1.txt, a_2.txt] vs [b_1.txt, b_2.txt] you can use: r'(.*)_[\d]+\.txt'.
    """
    if os.path.isdir(dir_path):
        file_paths = list_files(root_dir=dir_path, file_extension=log_extension, recursive_search=recursive_search)
    else:
        file_paths = [dir_path]
        log_extension = os.path.splitext(dir_path)[1]

    parse_log_files(file_paths,
                    log_extension,
                    smooth_factor,
                    save,
                    save_path,
                    ignore_metrics,
                    include_metrics,
                    pretty_names,
                    group_by)

parse_log_file

A function which will parse log files into a dictionary of metrics.

Parameters:

Name Type Description Default
file_path str

The path to a log file.

required
file_extension str

The extension of the log file.

required

Returns: An experiment summarizing the given log file.

Source code in fastestimator/fastestimator/summary/logs/log_parse.py
def parse_log_file(file_path: str, file_extension: str) -> Summary:
    """A function which will parse log files into a dictionary of metrics.

    Args:
        file_path: The path to a log file.
        file_extension: The extension of the log file.
    Returns:
        An experiment summarizing the given log file.
    """
    # TODO: need to handle multi-line output like confusion matrix
    experiment = Summary(strip_suffix(os.path.split(file_path)[1].strip(), file_extension))
    with open(file_path) as file:
        parse_log_iter(source=file, sync=experiment)
    return experiment

parse_log_files

Parse one or more log files for graphing.

This function which will iterate through the given log file paths, parse them to extract metrics, remove any metrics which are blacklisted, and then pass the necessary information on the graphing function.

Parameters:

Name Type Description Default
file_paths List[str]

A list of paths to various log files.

required
log_extension Optional[str]

The extension of the log files.

'.txt'
smooth_factor float

A non-negative float representing the magnitude of gaussian smoothing to apply (zero for none).

0
save bool

Whether to save (True) or display (False) the generated graph.

False
save_path Optional[str]

Where to save the image if save is true. Defaults to dir_path if not provided.

None
ignore_metrics Optional[Set[str]]

Any metrics within the log files which will not be visualized.

None
include_metrics Optional[Set[str]]

A whitelist of metric keys (None whitelists all keys).

None
pretty_names bool

Whether to modify the metric names in graph titles (True) or leave them alone (False).

False
group_by Optional[str]

Combine multiple log files by a regex to visualize their mean+-stddev. For example, to group together files like [a_1.txt, a_2.txt] vs [b_1.txt, b_2.txt] you can use: r'(.*)_[\d]+.txt'.

None

Raises:

Type Description
AssertionError

If no log files are provided.

ValueError

If a log file does not match the group_by regex pattern.

Source code in fastestimator/fastestimator/summary/logs/log_parse.py
def parse_log_files(file_paths: List[str],
                    log_extension: Optional[str] = '.txt',
                    smooth_factor: float = 0,
                    save: bool = False,
                    save_path: Optional[str] = None,
                    ignore_metrics: Optional[Set[str]] = None,
                    include_metrics: Optional[Set[str]] = None,
                    pretty_names: bool = False,
                    group_by: Optional[str] = None) -> None:
    """Parse one or more log files for graphing.

    This function which will iterate through the given log file paths, parse them to extract metrics, remove any
    metrics which are blacklisted, and then pass the necessary information on the graphing function.

    Args:
        file_paths: A list of paths to various log files.
        log_extension: The extension of the log files.
        smooth_factor: A non-negative float representing the magnitude of gaussian smoothing to apply (zero for none).
        save: Whether to save (True) or display (False) the generated graph.
        save_path: Where to save the image if save is true. Defaults to dir_path if not provided.
        ignore_metrics: Any metrics within the log files which will not be visualized.
        include_metrics: A whitelist of metric keys (None whitelists all keys).
        pretty_names: Whether to modify the metric names in graph titles (True) or leave them alone (False).
        group_by: Combine multiple log files by a regex to visualize their mean+-stddev. For example, to group together
            files like [a_1.txt, a_2.txt] vs [b_1.txt, b_2.txt] you can use: r'(.*)_[\d]+\.txt'.

    Raises:
        AssertionError: If no log files are provided.
        ValueError: If a log file does not match the `group_by` regex pattern.
    """
    if file_paths is None or len(file_paths) < 1:
        raise AssertionError("must provide at least one log file")
    if save and save_path is None:
        save_path = os.path.join(os.path.dirname(file_paths[0]), 'parse_logs.html')

    groups = defaultdict(list)  # {group_name: [experiment(s)]}
    for path in file_paths:
        experiment = parse_log_file(path, log_extension)
        try:
            key = (re.findall(group_by, os.path.split(path)[1]))[0] if group_by else experiment.name
        except IndexError:
            raise ValueError(f"The log {os.path.split(path)[1]} did not match the given regex pattern: {group_by}")
        groups[key].append(experiment)
    experiments = [average_summaries(name, exps) for name, exps in groups.items()]

    visualize_logs(experiments,
                   save_path=save_path,
                   smooth_factor=smooth_factor,
                   pretty_names=pretty_names,
                   ignore_metrics=ignore_metrics,
                   include_metrics=include_metrics)

parse_log_iter

A function which will parse lines into a dictionary of metrics.

Parameters:

Name Type Description Default
source Iterable[str]

A collection of lines to parse.

required
sync Summary

The summary to append into.

required

Returns:

Type Description
Summary

The updated summary object.

Source code in fastestimator/fastestimator/summary/logs/log_parse.py
def parse_log_iter(source: Iterable[str], sync: Summary) -> Summary:
    """A function which will parse lines into a dictionary of metrics.

    Args:
        source: A collection of lines to parse.
        sync: The summary to append into.

    Returns:
        The updated summary object.
    """
    last_step = 0
    last_epoch = 0
    for line in source:
        mode = None
        if line.startswith("FastEstimator-Train: step") or line.startswith("FastEstimator-Finish"):
            mode = "train"
        elif line.startswith("FastEstimator-Eval: step"):
            mode = "eval"
        elif line.startswith("FastEstimator-Test: step"):
            mode = "test"
        if mode is None:
            continue
        num = r"([-]?[0-9]+[.]?[0-9]*(e[-]?[0-9]+[.]?[0-9]*)?)"
        parsed_line = re.findall(r"([^:;]+):[\s]*(" + num + r"|None|\(" + num + ", " + num + ", " + num + r"\));", line)
        step = parsed_line[0]
        assert step[0].strip() == "step", \
            "Log file (%s) seems to be missing step information, or step is not listed first" % sync.name
        step = step[1]
        adjust_epoch = False
        if step == 'None':
            # This might happen if someone runs the test mode from the cli
            step = last_step
            # If the test mode was just guessing its epoch, use the prior epoch instead
            adjust_epoch = mode == 'test'
        else:
            step = int(step)
            last_step = step
        for metric in parsed_line[1:]:
            if metric[4]:
                val = ValWithError(float(metric[4]), float(metric[6]), float(metric[8]))
            else:
                val = metric[1]
                if val == 'None':
                    continue
                val = float(val)
            key = metric[0].strip()
            if key == 'epoch':
                if adjust_epoch:
                    val = last_epoch
                else:
                    last_epoch = val
            sync.history[mode][key].update({step: val})
    return sync