feat(v2): add from and to pandas df for documentarray#1161
feat(v2): add from and to pandas df for documentarray#1161anna-charlotte merged 33 commits intofeat-rewrite-v2from
Conversation
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
…to-pandas Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
pyproject.toml
Outdated
| rich = ">=13.1.0" | ||
| lz4 = {version= ">=1.0.0", optional = true} | ||
| pydub = {version = "^0.25.1", optional = true } | ||
| pandas = ">=1.1.0" |
There was a problem hiding this comment.
it should be an optional dependecy
There was a problem hiding this comment.
add it to common or a as a separate extra pandas ?
There was a problem hiding this comment.
separate. we will reorginaze at some point anyway
samsja
left a comment
There was a problem hiding this comment.
look goods, I added some comments
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @staticmethod | ||
| def access_path_dict_to_nested_dict( |
There was a problem hiding this comment.
didn't you already implement something like this in your previous PR? or am i mixing things up?
There was a problem hiding this comment.
yes, the function _access_path_to_dict that is used within _access_path_dict_to_nested_dict that I implemented in the last PR.
_access_path_to_dict: transforms one access path to nested dict_ access_path_dict_to_nested_dict: transforms dict with (multiple) access path keys to a joint nested dict by calling the former func
I will move the latter to that other helper file, to keep those functions together yes?
There was a problem hiding this comment.
ok sounds good, yes keeping them in the same helper file would be ideal
docarray/array/array/io.py
Outdated
| return json.dumps([doc.json() for doc in self]) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: |
There was a problem hiding this comment.
I would prefer returning a bool and rasing at the call site. That way it is clearer where the error actually occurs, and this method can be re-used in contexts where only the information is needed, without wanting to raise an exception
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_access_paths(cls, field_names: Optional[List[str]]) -> None: |
docarray/array/array/io.py
Outdated
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: | ||
| if cls.document_type == AnyDocument: |
There was a problem hiding this comment.
is there are plan to make this check more sophisticated as part of this PR? otherwise this doesn't need to be a method imo
There was a problem hiding this comment.
No, just tried to extract all the duplicate code but maybe I went a bit overboard hehe, moved it back to from_csv and from_pandas again
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
samsja
left a comment
There was a problem hiding this comment.
we should make pandas install optional. Otherwise looks good
Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
|
📝 Docs are deployed on https://ft-feat-from-to-pandas--jina-docs.netlify.app 🎉 |
* feat: load from and to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: from to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: add access path to dict Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: from to csv Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: add docstring and update tmpdir in test Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: merge nested dicts Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: update test Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply samis suggestion from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review wrt access paths Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: typos Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * refactor: move helper functions to helper file Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * test: fix fixture Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * feat: add to and from pandas df for documentarray Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * chore: add pandas to pyproject.toml Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * docs: update docstring Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: mypy Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: clean up Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestion Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> * fix: apply suggestions Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: Charlotte Gerhaher <charlotte.gerhaher@jina.ai> Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Arnav Zutshi <arnzut1324@gmail.com>
Add from and to pandas to DocumentArray IOMixin
DocumentArray[Doc].from_pandas(df)da.to_pandas()