jsontsv: JSON to TSV transformer

[ mit, program, text ] [ Propose Tags ]

Transforms JSON into tab-separated line-oriented output, for easier processing in Unix-style pipelines.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.1.0, 0.1.1.1, 0.1.2.0, 0.1.3.0, 0.1.3.1, 0.1.4.0, 0.1.4.1, 0.1.4.2, 0.1.4.3, 0.1.4.4, 0.1.4.5, 0.1.4.6, 0.1.5.0, 0.1.6.0, 0.1.6.1, 0.1.7.0
Dependencies aeson (>=0.8.0.0), attoparsec (>=0.12.0.0), base (>=4.6 && <4.9), bytestring, containers, csv (>=0.1.0), optparse-applicative, scientific, string-qq, text (>=1.0.0.0), unordered-containers, vector [details]
License MIT
Copyright (c) 2014 Daniel Choi
Author Daniel Choi
Maintainer dhchoi@gmail.com
Category Text
Home page https://github.com/danchoi/jsontsv
Uploaded by DanielChoi at 2016-05-19T14:08:25Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables jsontsv
Downloads 10223 total (37 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
Last success reported on 2016-05-19 [all 3 reports]

Readme for jsontsv-0.1.7.0

[back to package description]

jsontsv

Transforms JSON objects into delimiter-separated line-oriented output, which is more convenient for downstream processing with Unix tools like grep, awk, diff, etc. Also useful for converting JSON data into spreadsheet data: CSV-style output is supported.

Synopsis

input:

{
  "title": "Terminator 2: Judgment Day",
  "year": 1991,
  "stars": [
    {
      "name": "Arnold Schwarzenegger"
    },
    {
      "name": "Linda Hamilton"
    }
  ],
  "ratings": {
    "imdb": 8.5
  }
}
{
  "title": "Interstellar",
  "year": 2014,
  "stars": [
    {
      "name": "Matthew McConaughey"
    },
    {
      "name": "Anne Hathaway"
    }
  ],
  "ratings": {
    "imdb": 8.9
  }
}

Note that this input is not actually JSON at the top-level. It is a stream of JSON objects. It can be fed into jsontsv:

jsontsv 'title year stars.name ratings.imdb' < input

outputs this tab-separated text:

Terminator 2: Judgment Day	1991	Arnold Schwarzenegger,Linda Hamilton	8.5
Interstellar	2014	Matthew McConaughey,Anne Hathaway	8.9

You can pick off array elements using [i] syntax:

jsontsv 'title year stars[0].name' < input

outputs

Terminator 2: Judgment Day     1991    Arnold Schwarzenegger
Interstellar    2014    Matthew McConaughey

Use the -H flag to output the headers and the Unix tool column to align the columns for prettier output:

cat input | jsontsv -H 'title year stars.name ratings.imdb' | column -s $'\t' -t

outputs

title                       year  stars.name                            ratings.imdb
Terminator 2: Judgment Day  1991  Arnold Schwarzenegger,Linda Hamilton  8.5
Interstellar                2014  Matthew McConaughey,Anne Hathaway     8.9

Mapping an array index getter over nested arrays

As of version 0.1.5.0, you can map an index getter over arrays, such as this

{"id":2,"cast":[["Michael Caine",13473],["Demi Moore",65231]]}

with this expression

jsontsv -a '|' 'id cast._[0]' < input

This results in:

2   Michael Caine|Demi Moore

Extracting 2-tuple values

From version 0.1.5.0, you can can access lists of pairs, e.g.

{"menu":[["dinner","fish"],["dessert","pie"]]}

with the expression ["KEY"], e.g.:

jsontsv 'menu["dinner"]'
# => fish

jsontsv 'menu["dessert"]'
# => pie

jsontsv 'menu["drink"]'
# => null

This is useful because Data.Aeson emits Haskell 2-tuples in this format:

ghci> encode [("dinner", "fish"),("dessert","pie")]
"[[\"dinner\",\"fish\"],[\"dessert\",\"pie\"]]"

Moreover, encoding 2-tuples is often preferable to encoding Data.Map.Map because encoding 2-tuples preserves ordering, whereas encoding Map does not.

Installation

Assuming you have a recent version of the Haskell platform on your system,

cabal update
cabal install jsontsv

Alternatively,

git clone git@github.com:danchoi/jsontsv.git
cd jsontsv
cabal sandbox init
cabal install 
# Now copy .cabal-sandbox/bin/jsontsv to your PATH

Usage

jsontsv

Usage: jsontsv FIELDS [-a DELIM] ([-c|--csv] | [-d DELIM]) [-H]
               [-n|--null-string STRING] [-t|--true-string STRING]
               [-f|--false-string STRING] [-N|--newline STRING] [--debug]
  Transform JSON objects to TSV. On STDIN provide an input stream of
  whitespace-separated JSON objects.

Available options:
  -h,--help                Show this help text
  -a DELIM                 Concatentated array elem delimiter. Defaults to
                           comma.
  -c,--csv                 Output CSV
  -d DELIM                 Output field delimiter. Defaults to tab.
  -H                       Include headers
  -n,--null-string STRING  String to represent null value. Default: 'null'
  -t,--true-string STRING  String to represent boolean true. Default: 't'
  -f,--false-string STRING String to represent boolean false. Default: 'f'
  -N,--newline STRING      String to replace newlines in field text. Default: '
                           '
  --debug                  Debug keypaths

See https://github.com/danchoi/jsontsv for more information.

Input should be a stream of JSON objects of the same or mostly similar shape, separated by whitespace such as newlines. If the objects are wrapped in a JSON array at the top level or nested inside a top-level object, use the jq tool by Stephan Dolan to extract an object stream, e.g.:

curl -s "https://api.github.com/repos/rails/rails/issues" | 
jq -M '.[]' | 
jsontsv -H 'number title user.login state repository.name labels.name' 

outputs

number	title	user.login	state	repository.name	labels.name
17894	Add default value for `create_table_definition`	kamipo	open	null	
17893	Vendor/assets/images not being precompiled	runephilosof	open	null	
17891	Removed use of mocha in the info_controller tests	prathamesh-sonpatki	open	null	
17887	Wrong instance object passed to lambda on has_many :through	haruska	open	null	
17885	Update postgresql_database_tasks.rb	starbelly	open	null	
17884	Routes with {trailing_slash: true} do not match if referenced as non-named routes	dreyks	open	null	
17880	Fix humanize for already upcased acronyms	mintuhouse	open	null	activesupport
17879	humanize doesn't respect Infector acronyms	mintuhouse	open	null	activesupport
17864	eager loading a has_many through association ignores order of through association	jturkel	open	null	
17859	Includes HABTM returns correct size now	scambra	open	null	
17858	test preloading a HABTM association with hash conditions	scambra	open	null	
17854	Bug when using find_in_batches and reverse_order	robertjlooby	open	null	activerecord
17853	Remove deprecated `reset_changes` and `reset_attribute!` methods.	kaspth	open	null	
17851	Support for any type primary key	kamipo	open	null	
17845	Don't leak Object constants in core_ext/module/qualified_const	gsamokovarov	open	null	
17825	Fix Sidekiq ActiveJob integration setup	aripollak	open	null	activejob
17824	AR::RecordNotSaved & RecordNotDestroyed from save!/destroy! should include an error message	yuki24	open	null	
17822	Refactor `visit_ChangeColumnDefinition`	kamipo	open	null	
17820	Clear query cache on rollback	fw42	open	null	
17819	handle_positional_args does not work properly in route with format: false option	vevisystems	open	null	actionpack
17817	Hide potentially sensitive ActiveJob params from logs	aripollak	open	null	activejob
17815	Remove custom errors page section from the guides	yuki24	open	null	
17813	Changed welcome#index page overall look and feel	wazery	open	null	railties
17804	Null values will still be passed to custom serializers.	xaviershay	open	null	activerecord
17797	Don't remove mailer layouts files	y-yagi	open	null	
17795	ActiveRecord joins/includes bug	dgobaud	open	null	activerecord,regression
17793	Fix undesirable RangeError by Type::Integer. Add Type::UnsignedInteger.	kamipo	open	null	activerecord
17792	allow 'all' for :domain option in addition to :all	rockrep	open	null	
17788	Issue#17703 Test case for tempfile attribute	sivagollapalli	open	null	
17787	rails runner does not respect subdirectory / how to specify subdirectory?	doits	open	null	railties

JSON leaf values are printed as follows:

  • Strings and numbers are copied to output.
  • Boolean values are output as t or f. You can changes this with the -t and -f options.
  • null is printed as null. You can change this with the -n option.
  • If the leaf value is an array, it is concatenated into a single comma-separated string. This delimiter can be changed with the -a option.

Newlines in content

If a string field in the JSON contains a \n or \r character, these will be replaced by spaces by default. The replacement character can be changed with the -N option.

Column header aliases

If the default column headers using -H are too long, you can designate aliases with the pattern [keypath]:[alias]. E.g.,

curl -s "https://api.github.com/repos/rails/rails/issues" | 
jq -M '.[]' | 
jsontsv -H 'number title user.login:login state repository.name:repo_name' 

Known alternatives

  • jsawk Jsawk is like awk, but for JSON. (nodejs)
  • json2csv made with Go
  • jq a lightweight and flexible command-line JSON processor
  • jsoncsv a json to csv library in javascript/coffeescript
  • json A free, in-browser JSON to CSV converter
  • tsvutils utilities for processing tab-separated files

Further reading