swh.scanner.plot module

The purpose of this module is to display and to interact with the result of the scanner contained in the model.

The sunburst function generates a navigable sunburst chart from the directories information retrieved from the model. The chart displays for each directory the total number of files and the percentage of file known.

The size of the directory is defined by the total number of contents whereas the color gradient is generated relying on the percentage of contents known.

swh.scanner.plot.build_hierarchical_df(dirs_dataframe: pandas.core.frame.DataFrame, levels: List[str], metrics_columns: List[str], root_name: str) → pandas.core.frame.DataFrame[source]

Build a hierarchy of levels for Sunburst or Treemap charts.

For each directory the new dataframe will have the following information:

id: the directory name parent: the parent directory of id contents: the total number of contents of the directory id and the relative subdirectories known: the percentage of contents known relative to computed ‘contents’

Example: Given the following dataframe:

lev0     lev1                contents  known
 ''       ''                 20        2     //root
kernel   kernel/subdirker    5         0
telnet   telnet/subdirtel    10        4

The output hierarchical dataframe will be like the following:

   id                parent    contents  known
                               20        10.00
kernel/subdirker     kernel    5         0.00
telnet/subdirtel     telnet    10        40.00
                     total     20        10.00
kernel               total     5         0.00
telnet               total     10        40.00
total                          35        17.14

To create the hierarchical dataframe we need to iterate through the dataframe given in input relying on the number of levels.

Based on the previous example we have to do two iterations:

iteration 1 The generated dataframe ‘df_tree’ will be:

id                parent   contents  known
                           20        10.0
kernel/subdirker  kernel   5         0.0
telnet/subdirtel  telnet   10        40.0

iteration 2 The generated dataframe ‘df_tree’ will be:

id       parent   contents  known
         total    20        10.0
kernel   total    5         0.0
telnet   total    10        40.0

Note that since we have reached the last level, the parent given to the directory id is the directory root.

The ‘total’ row il computed by adding the number of contents of the dataframe given in input and the average of the contents known on the total number of contents.

swh.scanner.plot.compute_max_depth(dirs_path: List[pathlib.PosixPath], root: pathlib.PosixPath) → int[source]

Compute the maximum depth level of the given directory paths.

Example: for var/log/kernel/ the depth level is 3

swh.scanner.plot.generate_df_from_dirs(dirs: Dict[pathlib.PosixPath, Tuple[int, int]], columns: List[str], root: pathlib.PosixPath, max_depth: int) → pandas.core.frame.DataFrame[source]

Generate a dataframe from the directories given in input.

Example: given the following directories as input

dirs = {
    '/var/log/': (23, 2),
    '/var/log/kernel': (5, 0),
    '/var/log/telnet': (10, 3)
}

The generated dataframe will be:

lev0   lev1       lev2             contents  known
'var'  'var/log'   ''              23        2
'var'  'var/log' 'var/log/kernel'  5         0
'var'  'var/log' 'var/log/telnet'  10        3
swh.scanner.plot.sunburst(directories: Dict[pathlib.PosixPath, Tuple[int, int]], root: pathlib.PosixPath) → None[source]

Show the sunburst chart from the directories given in input.